Database Reference
In-Depth Information
5. Create tables to hold the processed data. The sensor table is partitioned
by the date:
CREATE TABLE sensor(time String, target_tmp Int,
actual_tmp Int,delta_tmp Int, building_id Int)
PARTITIONED BY (dt String);
CREATE TABLE building(building_id Int,
building_age Int, hvac_type String);
6. Because you are going to join the tables using the building ID, you are
going to create indexes for the tables using this column:
CREATE INDEX Building_IDX_1 ON TABLE sensor
(building_id)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild;
CREATE INDEX Building_IDX_2 ON TABLE building
(building_id)
AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
with deferred rebuild;
Now you can use Pig to extract, transform, and load the data from
staging tables into the analysis tables.
7. Open the Hadoop CLI and browse to the bin directory of the Pig
installation folder. Issue the following command to launch the Pig CLI,
passing in the switch to use HCatalog:
pig.cmd -useHCatalog;
This will launch the Grunt, the Pig CLI.
8. Issue the following Pig Latin script to load the data from the staging
table:
SensorData = load 'sensor_stg'
using org.apache.hcatalog.pig.HCatLoader();
Search WWH ::




Custom Search