Accessing HDInsight over Hive and ODBC - Pro Microsoft HDInsight: Hadoop on Windows

Database Reference

In-Depth Information

Chapter 8

Accessing HDInsight over Hive

and ODBC

If you are a SQL developer and want to cross-pollinate your existing SQL skills in the world of Big Data, Hive is

probably the best place for you. This section of the topic will enable you to be the Queen Bee of your Hadoop world

with Hive and gain business intelligence (BI) insights with Hive Query Language (HQL) filters and joins of Hadoop

Distributed File System (HDFS) datasets.

Hive provides a schema to the underlying HDFS data and a SQL-like query language to access that data. Simba,

in collaboration with Microsoft, provides an ODBC driver that is the supported and recommended interface for

connecting to HDInsight. It can enable client applications to connect and consume Hive data that resides on top of

your HDFS (WASB, in case of HDInsight). The driver is available for a free download at:

The preceding link has both the 32-bit and 64-bit Hive ODBC drivers available for download. You should

download the appropriate version of the driver for your operating system and the application that will consume the

driver, and be sure to match the bitness. For example, if you want to consume the driver from the 32-bit version of

Excel, you will need to install the 32-bit Hive ODBC driver.

This chapter shows you how to create a basic schema structure in Hive, load data into that schema, and access

the data using the ODBC driver from a client application.

Hive: The Hadoop Data Warehouse

Hive is a framework that sits on top of core Hadoop. It acts as a data-warehousing system on top of HDFS and provides

easy query mechanisms to the underlying HDFS data. By revisiting the Hadoop Ecosystem diagram in Chapter 1, you

can see that Hive sits right on top of Hadoop core, as shown in Figure 8-1 .

Search WWH ::

Custom Search

Home