Database Reference
In-Depth Information
The default row delimiter is not a tab character, but the Ctrl-A character from the set of
ASCII control codes (it has ASCII code 1). The choice of Ctrl-A, sometimes written as
^A in documentation, came about because it is less likely to be a part of the field text than
a tab character. There is no means for escaping delimiter characters in Hive, so it is im-
portant to choose ones that don't occur in data fields.
The default collection item delimiter is a Ctrl-B character, used to delimit items in an
ARRAY or STRUCT , or in key-value pairs in a MAP . The default map key delimiter is a
Ctrl-C character, used to delimit the key and value in a MAP . Rows in a table are delimited
by a newline character.
WARNING
The preceding description of delimiters is correct for the usual case of flat data structures, where the
complex types contain only primitive types. For nested types, however, this isn't the whole story, and in
fact the level of the nesting determines the delimiter.
For an array of arrays, for example, the delimiters for the outer array are Ctrl-B characters, as expected,
but for the inner array they are Ctrl-C characters, the next delimiter in the list. If you are unsure which
delimiters Hive uses for a particular nested structure, you can run a command like:
CREATE TABLE nested
AS
SELECT array(array(1, 2), array(3, 4))
FROM dummy;
and then use hexdump or something similar to examine the delimiters in the output file.
Hive actually supports eight levels of delimiters, corresponding to ASCII codes 1, 2, ... 8, but you can
override only the first three.
Thus, the statement:
CREATE TABLE ... ;
is identical to the more explicit:
CREATE TABLE ...
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
Notice that the octal form of the delimiter characters can be used — 001 for Ctrl-A, for in-
stance.
Search WWH ::




Custom Search