Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Type

Description

Examples

SQL Server

Equivalent

precision.

String: JDBC-compliant

timestamp format

YYYY-MM-DD

HH:MM:SS.fffffffff .

A date in YYYY-MM-DD

format.

DATE

date

A series of bytes.

BINARY

binary(n)

Defines a column that

contains a defined set of

additional values and their

types.

STRUCT

struct('John',

'Smith')

Defines a collection of key/

value pairs.

MAP

map('first',

'John', 'last',

'Smith')

Defines a sequenced

collection of values.

ARRAY

array('John',

'Smith')

Similar to sql_variant

types. They hold one value

at a time, but it can be any

one of the defined types for

the column.

UNION

Varies depending

on column

sql_variant

The types that are unique to Hive are MAP , ARRAY , and STRUCT . These types

are supported in Hive so that it can better work with the denormalized data

that is often found in Hadoop data stores. Relational database tables are

typically normalized; that is, a row holds only one value for a given column.

In Hadoop, though, it is not uncommon to find data where many values are

stored in a row for a “column.” This denormalization of the data makes it

easier and faster to write the data, but makes it more challenging to retrieve

it in a tabular format.

Hive addresses this with the MAP , ARRAY , and STRUCT types, which let a

developer flatten out the denormalized data into a multicolumn structure.

Microsoft Big Data Solutions

Search WWH ::

Custom Search

Home