Database Reference
In-Depth Information
output, which is compressed by the CompressionOutputStream . Finally, we call
finish() on CompressionOutputStream , which tells the compressor to finish
writing to the compressed stream, but doesn't close the stream. We can try it out with the
following command line, which compresses the string “Text” using the StreamCom-
pressor program with the GzipCodec , then decompresses it from standard input us-
ing gunzip :
% echo "Text" | hadoop StreamCompressor
org.apache.hadoop.io.compress.GzipCodec \
| gunzip -
Text
Inferring CompressionCodecs using CompressionCodecFactory
If you are reading a compressed file, normally you can infer which codec to use by look-
ing at its filename extension. A file ending in .gz can be read with GzipCodec , and so
on. The extensions for each compression format are listed in Table 5-1 .
CompressionCodecFactory provides a way of mapping a filename extension to a
CompressionCodec using its getCodec() method, which takes a Path object for
the file in question. Example 5-2 shows an application that uses this feature to decompress
files.
Example 5-2. A program to decompress a compressed file using a codec inferred from the
file's extension
public class FileDecompressor {
public static void main ( String [] args ) throws Exception {
String uri = args [ 0 ];
Configuration conf = new Configuration ();
FileSystem fs = FileSystem . get ( URI . create ( uri ), conf );
Path inputPath = new Path ( uri );
CompressionCodecFactory factory = new CompressionCodecFactory ( conf );
CompressionCodec codec = factory . getCodec ( inputPath );
if ( codec == null ) {
System . err . println ( "No codec found for " + uri );
System . exit ( 1 );
}
String outputUri =
CompressionCodecFactory . removeSuffix ( uri ,
codec . getDefaultExtension ());
Search WWH ::




Custom Search