Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

'FirstName', 'John'

'MiddleName', 'Doe'

'LastName', 'Smith'

You can access MAP column elements using the same syntax as you use with

an ARRAY , except that you use the key value instead of the position as the

index. Accessing the first and last names would be done with this syntax:

FullName['FirstName'], FullName['LastName']

After looking at the possible data types, you may be wondering how these

are stored in Hive. The next section covers the file formats that can be used

to store the data.

File Formats

Hive uses Hadoop as the underlying data store. Because the actual data is

stored in Hadoop, it can be in a wide variety of formats. As discussed in

Chapter 5, “Storing and Managing Data in HDFS,” Hadoop stores files and

doesn't impose any restrictions in the content or format of those files. Hive

offers enough flexibility that you can work with almost any file format, but

some formats require significantly more effort.

The simplest files to work with in Hive are text files, and this is the default

format Hive expects for files. These text files are normally delimited by

specific characters. Common formats in business settings are

comma-separated value files or tab-separated value files. However, the

drawback of these formats is that commas and tabs often appear in real

data; that is, they are embedded inside other text, and not intended as

delimiters in all instances. For that reason, Hive by default uses control

characters as delimiters, which are less likely to appear in real data. Table

6.2 describes these default delimiters.

Search WWH ::

Custom Search

Home