Database Reference
In-Depth Information
Text and Binary File Formats
Sqoop is capable of importing into a few different file formats. Text files (the default) of-
fer a human-readable representation of data, platform independence, and the simplest
structure. However, they cannot hold binary fields (such as database columns of type
VARBINARY ), and distinguishing between null values and String -based fields con-
taining the value "null" can be problematic (although using the --null-string im-
port option allows you to control the representation of null values).
To handle these conditions, Sqoop also supports SequenceFile s, Avro datafiles, and
Parquet files. These binary formats provide the most precise representation possible of the
imported data. They also allow data to be compressed while retaining MapReduce's abil-
ity to process different sections of the same file in parallel. However, current versions of
Sqoop cannot load Avro datafiles or SequenceFile s into Hive (although you can load
Avro into Hive manually, and Parquet can be loaded directly into Hive by Sqoop). Anoth-
er disadvantage of SequenceFile s is that they are Java specific, whereas Avro and
Parquet files can be processed by a wide range of languages.
Search WWH ::




Custom Search