Database Reference
In-Depth Information
NOTE
The default delimiters can be overridden when the table is created. This
is useful when you are dealing with text files that use different
delimiters, but are still formatted in a very similar way. The options for
that are shown in the section “Creating Tables” in this chapter.
Table 6.2 Hive Default Delimiters for Text Files
Delimiter Octal
Code
Description
\n
\012
New line character; this delimits rows in a text file.
^A
\001
Separates columns in each row.
^B
\002
Separates elements in an ARRAY , STRUCT , and key/
value pairs in a MAP .
^C
\003
Separates the key from the value in a MAP column.
What if one of the many text files that is accessed through a Hive table
uses a different value as a column delimiter? In that case, Hive won't be
able to parse the file accurately. The exact results will vary depending on
exactlyhowthetextfileisformatted,andhowtheHivetablewasconfigured.
However, it's likely that Hive will find less than the expected number of
columns in the text file. In this case, it will fill in the columns it finds values
for, and then output null values for any “missing” columns.
The same thing will happen if the data values in the files don't match
the data type defined on the Hive table. If a file contains alphanumeric
characters where Hive is expecting only numeric values, it will return null
values. This enables Hive to be resilient to data quality issues with the files
stored in Hadoop.
Some data, however, isn't stored as text. Binary file formats can be faster
and more efficient than text formats, as the data takes less space in the files.
Ifthedataisstoredinasmallernumberofbytes,moreofitcanbereadfrom
the disk in a single-read operation, and more of it can fit in memory. This
can improve performance, particularly in a big data system.
 
Search WWH ::




Custom Search