Database Reference
In-Depth Information
Address STRUCT < Street:STRING, City:STRING,
Region:STRING, PostalCode:STRING, Country:STRING > ,
Territories ARRAY < STRING > )
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ' , '
COLLECTION ITEMS TERMINATED BY '
|
'
LINES TERMINATED BY '
\
n '
STORED AS TEXTFILE;
HiveQL allows to implement the typical relational operations. The query
below performs a projection over the Name and City attributes:
SELECT Name, Address.City
FROM Employees;
A selection operation that obtains employees related to more than four
territories is expressed as follows:
SELECT *
FROM Employees
WHERE Size(Territories) > 4;
HiveQL supports different join operations, such as INNER JOIN , OUTER
JOIN ,and LEFT SEMI JOIN , among others. Below, we join tables Employees
and Orders :
SELECT *
FROM Employees E JOIN Orders O ON E.EmployeeID = O.EmployeeID
HiveQL also supports other SQL-like clauses, for example, GROUP BY ,
HAVING ,and ORDER BY .
Hive also supports computations that go beyond SQL-like languages,
for example, generating machine learning models. For this, Hive provides
language constructs that allow users to plug in their own transformation
scripts in an SQL statement. This is done through the MAP , REDUCE ,
TRANSFORM , DISTRIBUTE BY , SORT BY ,and CLUSTER BY keywords
in the SQL extensions. As an example, we show how we can write a Hive
program to count the occurrences of products in an input file, like in the
example of Fig. 13.1 . This is a variant of the typical word count example:
CREATE TABLE Products (Content STRING);
FROM
(MAP Products.Content
USING ' tokenizerScript ' AS ProductID, Count
FROM Products
CLUSTER BY ProductID) mapOut
REDUCE mapOut.ProductID, mapOut.Count
USING ' countScript ' AS ProductID, Count;
The scripts tokenizerScript and countScript can be implemented in any
language, like Python or Java. The former script produces a tuple for each
new product in the input; the latter script counts the number of occurrences
Search WWH ::




Custom Search