Database Reference
In-Depth Information
One caveat to note is that the dynamic partition columns are selected by
order and are the last columns in the select clause.
Integrating HCatalog with Pig and Hive
Although originally designed to provide the metadata store for Hive,
HCatalog's role has greatly expanded in the Hadoop ecosystem. It integrates
with other tools and supplies read and write interfaces for Pig and
MapReduce. It also integrates with Sqoop, which is a tool designed to
transfer data back and forth between Hadoop and relational databases such
as SQL Server and Oracle. HCatalog also exposes a REST interface so that
you can create custom tools and applications to interact with Hadoop data
structures. In addition, HCatalog contains a notification service so that it
can notify workflow tools such as Oozie when data has been loaded or
updated.
Another key feature of HCatalog is that it allows developers to share data
and structures across internal toolsets like Pig and Hive. You do not have to
explicitly type the data structures in each program. This allows us to use the
right tool for the right job. For example, we can load data into Hadoop using
HCatalog, perform some ETL on the data using Pig, and then aggregate the
data using Hive. After the processing, you could then send the data to your
data warehouse housed in SQL Server using Sqoop. You can even automate
the process using Oozie.
To complete the following exercise, you need to download and install the
HDP for Windows from Hortonworks. You can set up HDP for Windows on
a development server to provide a local test environment that supports a
single-node deployment. (For a detailed discussion of installing the Hadoop
development environment on Windows, see http://hortonworks.com/
products/hdp-windows/ .)
In this exercise, we analyze sensor data collected from HVAC systems
monitoring the temperatures of buildings. You can download the sensor
data from http://www.wiley.com/go/microsoftbigdatasolutions . There
should be two files, one with sensor data ( HVAC.csv ) and a file containing
building information ( building.csv ). After extracting the files, load the
data into a staging table using HCatalog and Hive:
1. Open the Hive CLI. Because Hive and HCatalog are so tightly coupled,
you can write HCatalog commands directly in the Hive CLI. As a matter
Search WWH ::




Custom Search