Importing Data - Apache Sqoop

Databases Reference

In-Depth Information

sqoop import --compress \

--compression-codec org.apache.hadoop.io.compress.BZip2Codec

Another benefit of leveraging MapReduce's compression abilities is that Sqoop can make

use of all Hadoop compression codecs out of the box. You don't need to enable com‐

pression codes within Sqoop itself. That said, Sqoop can't use any compression algo‐

rithm not known to Hadoop. Prior to using it with Sqoop, make sure your desired codec

is properly installed and configured across all nodes in your cluster.

As Sqoop delegates compression to the MapReduce engine, you need

to make sure the compressed map output is allowed in your Hadoop

configuration. For example, if in the mapred-site.xml file, the prop‐

erty mapred.output.compress is set to false with the final flag, then

Sqoop won't be able to compress the output files even when you call it

with the --compress parameter.

The selected compression codec might have a significant impact on subsequent pro‐

cessing. Some codecs do not support seeking to the middle of the compressed file

without reading all previous content, effectively preventing Hadoop from processing

the input files in a parallel manner. You should use a splittable codec for data that you're

planning to use in subsequent processing. Table 2-2 contains a list of splittable and

nonsplittable compression codecs that will help you choose the proper codec for your

use case.

Table 2-2. Compression codecs

Splittable

Not Splittable

BZip2, LZO

GZip, Snappy

2.7. Speeding Up Transfers

Problem

Sqoop is a great tool, and it's processing bulk transfers very well. Can Sqoop run faster?

Solution

For some databases you can take advantage of the direct mode by using the --direct

parameter:

sqoop import \

--connect jdbc:mysql://mysql.example.com/sqoop \

--username sqoop \

--table cities \

--direct

Search WWH ::

Custom Search

Home