Database Reference
In-Depth Information
The basic
SELECT
statement is intact, along with familiar elements such as
WHERE
clauses, table and column aliases, and
ORDER BY
clauses:
SELECT c.name, c.city, c.state, c.postalCode,
c.purchases
FROM MsBigData.customer c LIMIT 100
WHERE c.state='FL'
ORDER BY c.postalCode;
NOTE
One useful difference to note is the
LIMIT
clause. This restricts the
query to an upper limit of rows that it can return. If you are used to
SQL Server, you might be familiar with the
TOP
clause.
LIMIT
works in
the same way, but it doesn't support percentage based row limits.
LIMIT
can prove very handy when you are exploring data and don't
want to process millions or billions of rows in your Hive tables.
When you run the
SELECT
statement, you'll notice that the results are
as expected, with the exception of the
purchases
column. Because that
column represents a collection of values, Hive flattens it into something
that it can return as a column value. It does this using Java Script Object
Notation (JSON), a standard format for representing objects:
John Smith Jacksonville FL 32226
{"Food":456.98,"Lodging":1245.45}
This might be useful to get a quick look at a table's data, but in most
instances you will want to extract portions of the value out. Querying
individual elements of complex types is fairly straightforward. For
MAP
types, you reference the key value:
SELECT c.name, c.city, c.state, c.postalCode,
c.purchases['Lodging']
If
purchases
were an
ARRAY
, you would use the index of the value you are
interested in: