Quick reference of Hive commands I used often last time.
Create external table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
-- Create all tables -- S3 root -- set rootDir = s3://my-bucket/data/output/; set rootDir = s3://my-bucket/output/data/datatype=gps/; ----------------------------------- ----------------------------------- -- Trip table ----------------------------------- ----------------------------------- set tripDir = ${hiveconf:rootDir}tablename=trip; CREATE external TABLE trip ( trip_id STRING, probe_id STRING, provider_id STRING, -- 1 is moving, 0 is stopped is_moving TINYINT, is_completed BOOLEAN, -- UTC time start_time TIMESTAMP, -- UTC time start_lat DOUBLE, start_lon DOUBLE, -- Adjusted for anonymity purposes start_lat_adj DOUBLE, start_lon_adj DOUBLE, -- UTC time end_time TIMESTAMP, -- UTC time end_lat DOUBLE, end_lon DOUBLE, -- Adjusted for anonymity purposes end_lat_adj DOUBLE, end_lon_adj DOUBLE, trip_mean_speed DOUBLE, trip_max_speed DOUBLE, trip_distance_m DOUBLE, -- 0 - walk, 1 - car mode TINYINT, -- Demographic data specific for this trip trip_demographic STRUCT< is_start_home: BOOLEAN, is_end_home: BOOLEAN, other: ARRAY<STRUCT< name: String, value:String>> >, -- key points along the trip to enable differentiating between travel options path ARRAY<STRUCT< probe_point_id: BIGINT, probe_id: string, cell_id: string, provider_id: string, -- 1 is moving, 0 is stopped is_moving: TINYINT, arrival_time: TIMESTAMP, capture_time: TIMESTAMP, lat: DOUBLE, lon: DOUBLE >> ) PARTITIONED BY (year INT,month INT,day INT) STORED AS TEXTFILE LOCATION '${hiveconf:tripDir}'; |
Add or update partition
1 2 |
ALTER TABLE gps_input PARTITION(provider="acsoft",year = 2014, month = 10, day = 01) SET LOCATION 's3://my-bucket/staging/provider=acsoft/year=2014/month=10/day=01'; |
Load data to partitions
1 |
ALTER TABLE gps_input RECOVER PARTITIONS; |
Recover partitions sometimes didn’t work for me, but the following worked:
1 |
MSCK REPAIR TABLE gps_input; |