Database Reference
In-Depth Information
InputStream in = null ;
OutputStream out = null ;
try {
in = codec . createInputStream ( fs . open ( inputPath ));
out = fs . create ( new Path ( outputUri ));
IOUtils . copyBytes ( in , out , conf );
} finally {
IOUtils . closeStream ( in );
IOUtils . closeStream ( out );
}
}
}
Once the codec has been found, it is used to strip off the file suffix to form the output file-
name (via the removeSuffix() static method of CompressionCodecFactory ).
In this way, a file named file.gz is decompressed to file by invoking the program as fol-
lows:
% hadoop FileDecompressor file.gz
CompressionCodecFactory loads all the codecs in Table 5-2 , except LZO, as well
as any listed in the io.compression.codecs configuration property ( Table 5-3 ) . By
default, the property is empty; you would need to alter it only if you have a custom codec
that you wish to register (such as the externally hosted LZO codecs). Each codec knows
its default filename extension, thus permitting CompressionCodecFactory to
search through the registered codecs to find a match for the given extension (if any).
Table 5-3. Compression codec properties
Property name
Type
Default
value
Description
io.compression.codecs Comma-separated
Class names
A list of additional CompressionCodec
classes for compression/decompression
Native libraries
For performance, it is preferable to use a native library for compression and decompres-
sion. For example, in one test, using the native gzip libraries reduced decompression times
by up to 50% and compression times by around 10% (compared to the built-in Java imple-
mentation). Table 5-4 shows the availability of Java and native implementations for each
compression format. All formats have native implementations, but not all have a Java im-
plementation (LZO, for example).
Search WWH ::




Custom Search