Database Reference
In-Depth Information
FROM (
MAP doctext USING 'python wc_mapper.py' AS (word, cnt)
FROM docs
CLUSTER BY word
) a
REDUCE word, cnt USING 'python wc_reduce.py';
FIGURE 2.13
An example HiveQl query. (From A. Thusoo et al., PVLDB , 2(2), 1626-1629,
20 09.)
Thus, it supports all the major primitive types (e.g., integers, floats, strings) as well
as complex types (e.g., maps, lists, structs). Hive supports queries expressed in an
SQL-like declarative language, HiveQL ,* and therefore can be easily understood by
anyone who is familiar with SQL. These queries are compiled into MapReduce jobs
that are executed using Hadoop. In addition, HiveQL enables users to plug in custom
MapReduce scripts into queries [125]. For example, the canonical MapReduce word
count example on a table of documents (Figure 2.1) can be expressed in HiveQL as
depicted in Figure 2.13 where the MAP clause indicates how the input columns ( doc-
text ) can be transformed using a user program ('python wc_mapper.py') into output
columns ( word and cnt ). The REDUCE clause specifies the user program to invoke
('python wc_reduce.py') on the output columns of the subquery.
HiveQL supports Data Definition Language (DDL) statements, which can be
used to create, drop, and alter tables in a database [124]. It allows users to load
data from external sources and insert query results into Hive tables via the load
and insert Data Manipulation Language (DML) statements, respectively. However,
HiveQL currently does not support the update and deletion of rows in existing tables
(in particular, INSERT INTO, UPDATE, and DELETE statements), which allows
the use of very simple mechanisms to deal with concurrent read and write opera-
tions without implementing complex locking protocols. The metastore component
is the Hive's system catalog, which stores metadata about the underlying table. This
metadata is specified during table creation and reused every time the table is refer-
enced in HiveQL. The metastore distinguishes Hive as a traditional warehousing
solution when compared with similar data-processing systems that are built on top
of MapReduce-like architectures like Pig Latin [109].
2.4.4 t Tenzing
The Tenzing system [33] has been presented by Google as an SQL query execu-
tion engine which is built on top of MapReduce and provides a comprehensive
SQL92 implementation with some SQL99 extensions (e.g., ROLLUP() and CUBE()
OLAP extensions). Tenzing also supports querying data in different formats such
as: row stores (e.g., MySQL database), column stores, Bigtable (Google's built in
* http://wiki.apache.org/hadoop/Hive/LanguageManual.
 
Search WWH ::




Custom Search