Expanding Your Capability with HBase and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

5. Create tables to hold the processed data. The sensor table is partitioned

by the date:

CREATE TABLE sensor(time String, target_tmp Int,

actual_tmp Int,delta_tmp Int, building_id Int)

PARTITIONED BY (dt String);

CREATE TABLE building(building_id Int,

building_age Int, hvac_type String);

6. Because you are going to join the tables using the building ID, you are

going to create indexes for the tables using this column:

CREATE INDEX Building_IDX_1 ON TABLE sensor

(building_id)

AS

'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

with deferred rebuild;

CREATE INDEX Building_IDX_2 ON TABLE building

(building_id)

AS

'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

with deferred rebuild;

Now you can use Pig to extract, transform, and load the data from

staging tables into the analysis tables.

7. Open the Hadoop CLI and browse to the bin directory of the Pig

installation folder. Issue the following command to launch the Pig CLI,

passing in the switch to use HCatalog:

pig.cmd -useHCatalog;

This will launch the Grunt, the Pig CLI.

8. Issue the following Pig Latin script to load the data from the staging

table:

SensorData = load 'sensor_stg'

using org.apache.hcatalog.pig.HCatLoader();

Search WWH ::

Custom Search

Home