Database Reference
In-Depth Information
of storing data in an actual table structure, the data continues to live in its
original file format.
At this point, in this example, although we have defined a table schema, no
data has been explicitly loaded. To load data after the table is created, you
could use the following command to log all of your IIS web server log files
that exist in the logs directory:
load data inpath '/logs'
overwrite into table iislog;
This demonstration only scratches the surface of the capabilities in Hive.
Hive supports a robust set of features, including complex data types (maps,
structs, and arrays), partitioning, views, and indexes. These features are
beyondthescopeofthisbook,buttheycertainlywarrantfurtherexploration
if you intend to use this technology.
Querying Data
Like the process used previously to create a Hive table, HQL can be
subsequently used to query data out for the purposes of summarization
or analysis. The syntax, as you might expect, is almost identical to that
use to query a SQL Server database. Don't be fooled, though. Although the
interface looks a lot like SQL, behind the scenes Hive does quite a bit of
heavy lifting to optimize and convert the SQL-like syntax to one or more
MapReduce jobs that is used to satisfy the query:
SELECT *
FROM iislog;
This simple query, much like its counterparts in the SQL world, simply
returns all rows found in the iislog table. Although this is not a sophisticated
query, the HQL supports both basic operations such as sorts and joins
to more sophisticated operations, including group by, unions, and even
user-defined functions. The following example is a common example of a
group by query to count the number of times each URI occurs in the web
server logs:
SELECT uristem, COUNT(*)
FROM iislog
GROUP BY (uristem);
Search WWH ::




Custom Search