Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

The basic SELECT statement is intact, along with familiar elements such as

WHERE clauses, table and column aliases, and ORDER BY clauses:

SELECT c.name, c.city, c.state, c.postalCode,

c.purchases

FROM MsBigData.customer c LIMIT 100

WHERE c.state='FL'

ORDER BY c.postalCode;

NOTE

One useful difference to note is the LIMIT clause. This restricts the

query to an upper limit of rows that it can return. If you are used to

SQL Server, you might be familiar with the TOP clause. LIMIT works in

the same way, but it doesn't support percentage based row limits.

LIMIT can prove very handy when you are exploring data and don't

want to process millions or billions of rows in your Hive tables.

When you run the SELECT statement, you'll notice that the results are

as expected, with the exception of the purchases column. Because that

column represents a collection of values, Hive flattens it into something

that it can return as a column value. It does this using Java Script Object

Notation (JSON), a standard format for representing objects:

John Smith Jacksonville FL 32226

{"Food":456.98,"Lodging":1245.45}

This might be useful to get a quick look at a table's data, but in most

instances you will want to extract portions of the value out. Querying

individual elements of complex types is fairly straightforward. For MAP

types, you reference the key value:

SELECT c.name, c.city, c.state, c.postalCode,

c.purchases['Lodging']

If purchases were an ARRAY , you would use the index of the value you are

interested in:

Search WWH ::

Custom Search

Home