Database Reference
In-Depth Information
Address STRUCT
<
Street:STRING, City:STRING,
Region:STRING, PostalCode:STRING, Country:STRING
>
,
Territories ARRAY
<
STRING
>
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY
'
,
'
COLLECTION ITEMS TERMINATED BY
'
|
'
LINES TERMINATED BY
'
\
n
'
STORED AS TEXTFILE;
HiveQL allows to implement the typical relational operations. The query
below performs a projection over the
Name
and
City
attributes:
SELECT Name, Address.City
FROM Employees;
A selection operation that obtains employees related to more than four
territories is expressed as follows:
SELECT *
FROM Employees
WHERE Size(Territories)
>
4;
HiveQL supports different join operations, such as
INNER JOIN
,
OUTER
JOIN
,and
LEFT SEMI JOIN
, among others. Below, we join tables
Employees
and
Orders
:
SELECT *
FROM Employees E JOIN Orders O ON E.EmployeeID = O.EmployeeID
HiveQL also supports other SQL-like clauses, for example,
GROUP BY
,
HAVING
,and
ORDER BY
.
Hive also supports computations that go beyond SQL-like languages,
for example, generating machine learning models. For this, Hive provides
language constructs that allow users to plug in their own transformation
scripts in an SQL statement. This is done through the
MAP
,
REDUCE
,
TRANSFORM
,
DISTRIBUTE BY
,
SORT BY
,and
CLUSTER BY
keywords
in the SQL extensions. As an example, we show how we can write a Hive
program to count the occurrences of products in an input file, like in the
example of Fig.
13.1
. This is a variant of the typical word count example:
CREATE TABLE Products (Content STRING);
FROM
(MAP Products.Content
USING
'
tokenizerScript
'
AS ProductID, Count
FROM Products
CLUSTER BY ProductID) mapOut
REDUCE mapOut.ProductID, mapOut.Count
USING
'
countScript
'
AS ProductID, Count;
The scripts
tokenizerScript
and
countScript
can be implemented in any
language, like Python or Java. The former script produces a tuple for each
new product in the input; the latter script counts the number of occurrences
Search WWH ::
Custom Search