Database Reference
In-Depth Information
Table 5-4. Compression library implementations
Compression format Java implementation? Native implementation?
DEFLATE
Yes
Yes
gzip
Yes
Yes
bzip2
Yes
Yes
LZO
No
Yes
LZ4
No
Yes
Snappy
No
Yes
The Apache Hadoop binary tarball comes with prebuilt native compression binaries for
64-bit Linux, called libhadoop.so . For other platforms, you will need to compile the lib-
raries yourself, following the BUILDING.txt instructions at the top level of the source
tree.
The native libraries are picked up using the Java system property
java.library.path . The hadoop script in the etc/hadoop directory sets this property
for you, but if you don't use this script, you will need to set the property in your applica-
tion.
By default, Hadoop looks for native libraries for the platform it is running on, and loads
them automatically if they are found. This means you don't have to change any configura-
tion settings to use the native libraries. In some circumstances, however, you may wish to
disable use of native libraries, such as when you are debugging a compression-related
problem. You can do this by setting the property io.native.lib.available to
false , which ensures that the built-in Java equivalents will be used (if they are avail-
able).
CodecPool
If you are using a native library and you are doing a lot of compression or decompression
in your application, consider using CodecPool , which allows you to reuse compressors
and decompressors, thereby amortizing the cost of creating these objects.
The code in Example 5-3 shows the API, although in this program, which creates only a
single Compressor , there is really no need to use a pool.
Example 5-3. A program to compress data read from standard input and write it to stand-
ard output using a pooled compressor
public class PooledStreamCompressor {
Search WWH ::




Custom Search