Hadoop I/O - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Table 5-4. Compression library implementations

Compression format Java implementation? Native implementation?

DEFLATE

Yes

gzip

Yes

bzip2

Yes

LZO

No

Yes

LZ4

No

Yes

Snappy

No

Yes

The Apache Hadoop binary tarball comes with prebuilt native compression binaries for

64-bit Linux, called libhadoop.so . For other platforms, you will need to compile the lib-

raries yourself, following the BUILDING.txt instructions at the top level of the source

tree.

The native libraries are picked up using the Java system property

java.library.path . The hadoop script in the etc/hadoop directory sets this property

for you, but if you don't use this script, you will need to set the property in your applica-

tion.

By default, Hadoop looks for native libraries for the platform it is running on, and loads

them automatically if they are found. This means you don't have to change any configura-

tion settings to use the native libraries. In some circumstances, however, you may wish to

disable use of native libraries, such as when you are debugging a compression-related

problem. You can do this by setting the property io.native.lib.available to

false , which ensures that the built-in Java equivalents will be used (if they are avail-

able).

CodecPool

If you are using a native library and you are doing a lot of compression or decompression

in your application, consider using CodecPool , which allows you to reuse compressors

and decompressors, thereby amortizing the cost of creating these objects.

The code in Example 5-3 shows the API, although in this program, which creates only a

single Compressor , there is really no need to use a pool.

Example 5-3. A program to compress data read from standard input and write it to stand-

ard output using a pooled compressor

public class PooledStreamCompressor {

Search WWH ::

Custom Search

Home