Database Reference
In-Depth Information
The details of querying this denormalized data are discussed later in this
chapter. For now, we will review the data types that support these
structures.
A
STRUCT
is a column that contains multiple defined fields. Each field can
have its own data type. This is comparable to structs in most programming
languages. In Hive, you can declare a
STRUCT
for a full name using the
following syntax:
STRUCT <FirstName:string, MiddleName:string,
LastName:string>
To access the individual fields of the
STRUCT
type, use the column name
followed by a period and the name of the field:
FullName.FirstName
An
ARRAY
is a column that contains an ordered sequence of values. All the
values must be of the same type:
ARRAY<STRING>
Because it is ordered, the individual values can be accessed by their index.
As with Java and .NET languages,
ARRAY
types use a zero-based index, so
you use an index of 0 to access the first element, and an index of 2 to access
the third element. If the preceding Full Name column were declared as an
ARRAY
, with first name in the first position, middle name in the second
position,andlastnameinthethirdposition,youwouldaccessthefirstname
with index 0 and last name with index 2:
FullName[0], FullName[2]
A
MAP
column is a collection of key/value pairs, where both the key and
values have data types. The key and value do not have to use the same data
type. A
MAP
for Full Name might be declared using the following syntax:
MAP<string, string>
In the Full Name case, you would populate the
MAP
column with the
following key/value pairs: