Database Reference
In-Depth Information
Chapter 8
Accessing HDInsight over Hive
and ODBC
If you are a SQL developer and want to cross-pollinate your existing SQL skills in the world of Big Data, Hive is
probably the best place for you. This section of the topic will enable you to be the Queen Bee of your Hadoop world
with Hive and gain business intelligence (BI) insights with Hive Query Language (HQL) filters and joins of Hadoop
Distributed File System (HDFS) datasets.
Hive provides a schema to the underlying HDFS data and a SQL-like query language to access that data. Simba,
in collaboration with Microsoft, provides an ODBC driver that is the supported and recommended interface for
connecting to HDInsight. It can enable client applications to connect and consume Hive data that resides on top of
your HDFS (WASB, in case of HDInsight). The driver is available for a free download at:
http://www.microsoft.com/en-us/download/details.aspx?id=40886
The preceding link has both the 32-bit and 64-bit Hive ODBC drivers available for download. You should
download the appropriate version of the driver for your operating system and the application that will consume the
driver, and be sure to match the bitness. For example, if you want to consume the driver from the 32-bit version of
Excel, you will need to install the 32-bit Hive ODBC driver.
This chapter shows you how to create a basic schema structure in Hive, load data into that schema, and access
the data using the ODBC driver from a client application.
Hive: The Hadoop Data Warehouse
Hive is a framework that sits on top of core Hadoop. It acts as a data-warehousing system on top of HDFS and provides
easy query mechanisms to the underlying HDFS data. By revisiting the Hadoop Ecosystem diagram in Chapter 1, you
can see that Hive sits right on top of Hadoop core, as shown in Figure 8-1 .
 
Search WWH ::




Custom Search