Databases Reference
In-Depth Information
sqoop import --compress \
--compression-codec org.apache.hadoop.io.compress.BZip2Codec
Another benefit of leveraging MapReduce's compression abilities is that Sqoop can make
use of all Hadoop compression codecs out of the box. You don't need to enable com‐
pression codes within Sqoop itself. That said, Sqoop can't use any compression algo‐
rithm not known to Hadoop. Prior to using it with Sqoop, make sure your desired codec
is properly installed and configured across all nodes in your cluster.
As Sqoop delegates compression to the MapReduce engine, you need
to make sure the compressed map output is allowed in your Hadoop
configuration. For example, if in the mapred-site.xml file, the prop‐
erty mapred.output.compress is set to false with the final flag, then
Sqoop won't be able to compress the output files even when you call it
with the --compress parameter.
The selected compression codec might have a significant impact on subsequent pro‐
cessing. Some codecs do not support seeking to the middle of the compressed file
without reading all previous content, effectively preventing Hadoop from processing
the input files in a parallel manner. You should use a splittable codec for data that you're
planning to use in subsequent processing. Table 2-2 contains a list of splittable and
nonsplittable compression codecs that will help you choose the proper codec for your
use case.
Table 2-2. Compression codecs
Splittable
Not Splittable
BZip2, LZO
GZip, Snappy
2.7. Speeding Up Transfers
Problem
Sqoop is a great tool, and it's processing bulk transfers very well. Can Sqoop run faster?
Solution
For some databases you can take advantage of the direct mode by using the --direct
parameter:
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--table cities \
--direct
 
Search WWH ::




Custom Search