Database Reference
In-Depth Information
The base path specified in the write() method of MultipleOutputs is interpreted
relative to the output directory, and because it may contain file path separator characters
( / ), it's possible to create subdirectories of arbitrary depth. For example, the following
modification partitions the data by station and year so that each year's data is contained in
a directory named by the station ID (such as 029070-99999/1901/part-r-00000 ):
@Override
protected void reduce ( Text key , Iterable < Text > values , Context
context )
throws IOException , InterruptedException {
for ( Text value : values ) {
parser . parse ( value );
String basePath = String . format ( "%s/%s/part" ,
parser . getStationId (), parser . getYear ());
multipleOutputs . write ( NullWritable . get (), value , basePath );
}
}
MultipleOutputs delegates to the mapper's OutputFormat . In this example it's a
TextOutputFormat , but more complex setups are possible. For example, you can cre-
ate named outputs, each with its own OutputFormat and key and value types (which
may differ from the output types of the mapper or reducer). Furthermore, the mapper or
reducer (or both) may write to multiple output files for each record processed. Consult the
Java documentation for more information.
Lazy Output
FileOutputFormat subclasses will create output ( part-r- nnnnn ) files, even if they
are empty. Some applications prefer that empty files not be created, which is where
LazyOutputFormat helps. It is a wrapper output format that ensures that the output
file is created only when the first record is emitted for a given partition. To use it, call its
setOutputFormatClass() method with the JobConf and the underlying output
format.
Streaming supports a -lazyOutput option to enable LazyOutputFormat .
Database Output
The output formats for writing to relational databases and to HBase are mentioned in
Database Input (and Output) .
[ 55 ] But see the classes in org.apache.hadoop.mapred for the old MapReduce API counterparts.
[ 56 ] This is how the mapper in SortValidator.RecordStatsChecker is implemented.
Search WWH ::




Custom Search