Database Reference
In-Depth Information
Listing 8-10.
Selecting all columns
SELECT * FROM stock_analysis;
■
in cases where selecting only a few columns reduces a lot of the data to transfer, it may still be interesting to
select only a few columns.
Note
In addition to common SQL semantics, HiveQL supports the inclusion of custom MapReduce scripts embedded
in a query through the MAP and REDUCE clauses, as well as custom User Defined Functions (UDFs) implemented in
Java. This extensibility enables you to use HiveQL to perform complex transforms to data as it is queried.
For a complete reference on Hive data types and HQL, see the Apache Hive language manual site:
Hive Storage
Hive stores all its metadata in its storage, called a
Hive MetaStore
. Traditional Hive uses its native Derby database by
default, but Hive can also be configured to use MySQL as its MetaStore. With HDInsight, this capability extends and
the Hive MetaStore can be configured to be SQL Server as well as SQL Azure. You can modify the Hive configuration
file
hive-site.xml
found under the
conf
folder in the Hive installation directory to customize your MetaStore. You
can also customize the Hive MetaStore while deploying your HDInsight cluster through the
CUSTOM CREATE
wizard,
which is explained in Chapter 3.
The Hive ODBC Driver
One of the main advantages of Hive is that it provides a querying experience that is similar to that of a relational
database, which is a familiar experience for many business users. Additionally, the ODBC driver for Hive enables
users to connect to HDInsight and execute HiveQL queries from familiar tools like Excel, SQL Server Integration
Services (SSIS), PowerView, and others. Essentially, the driver allows all ODBC-compliant clients to consume
HDInsight data through familiar ODBC Data Source Names (DSNs), thus exposing HDInsight to a wide range of client
applications.
Installing the Driver
The driver comes in two flavors: 64 bit and 32 bit. Be sure to
install both the 32-bit and 64-bit versions of the
driver
—you'll need to install them separately. If you install only the 64-bit driver, you'll get errors in your 32-bit
applications—for example, Visual Studio when trying to configure your connections. The driver can be downloaded
and installed from the following site:
Once the installation of the driver is complete, you can confirm the installation status by checking if you have the
Microsoft Hive ODBC Driver
present in the ODBC Data Source Administrator's list of drivers, as shown in Figure
8-4
.