HDFS, Hive, HBase, and HCatalog - Microsoft Big Data Solutions

Database Reference

In-Depth Information

of storing data in an actual table structure, the data continues to live in its

original file format.

At this point, in this example, although we have defined a table schema, no

data has been explicitly loaded. To load data after the table is created, you

could use the following command to log all of your IIS web server log files

that exist in the logs directory:

load data inpath '/logs'

overwrite into table iislog;

This demonstration only scratches the surface of the capabilities in Hive.

Hive supports a robust set of features, including complex data types (maps,

structs, and arrays), partitioning, views, and indexes. These features are

beyondthescopeofthisbook,buttheycertainlywarrantfurtherexploration

if you intend to use this technology.

Querying Data

Like the process used previously to create a Hive table, HQL can be

subsequently used to query data out for the purposes of summarization

or analysis. The syntax, as you might expect, is almost identical to that

use to query a SQL Server database. Don't be fooled, though. Although the

interface looks a lot like SQL, behind the scenes Hive does quite a bit of

heavy lifting to optimize and convert the SQL-like syntax to one or more

MapReduce jobs that is used to satisfy the query:

SELECT *

FROM iislog;

This simple query, much like its counterparts in the SQL world, simply

returns all rows found in the iislog table. Although this is not a sophisticated

query, the HQL supports both basic operations such as sorts and joins

to more sophisticated operations, including group by, unions, and even

user-defined functions. The following example is a common example of a

group by query to count the number of times each URI occurs in the web

server logs:

SELECT uristem, COUNT(*)

FROM iislog

GROUP BY (uristem);

Search WWH ::

Custom Search

Home