Database Reference
In-Depth Information
can be on the local filesystem, on HDFS, or on another Hadoop-readable filesystem (such
as S3). If no scheme is supplied, then the files are assumed to be local. (This is true even
when the default filesystem is not the local filesystem.)
You can also copy archive files (JAR files, ZIP files, tar files, and gzipped tar files) to
your tasks using the -archives option; these are unarchived on the task node. The -
libjars option will add JAR files to the classpath of the mapper and reducer tasks. This
is useful if you haven't bundled library JAR files in your job JAR file.
Let's see how to use the distributed cache to share a metadata file for station names. The
command we will run is:
% hadoop jar hadoop-examples.jar \
MaxTemperatureByStationNameUsingDistributedCacheFile \
-files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all
output
This command will copy the local file stations-fixed-width.txt (no scheme is supplied, so
the path is automatically interpreted as a local file) to the task nodes, so we can use it to
look up station names. The listing for MaxTemperatureBySta-
tionNameUsingDistributedCacheFile appears in Example 9-13 .
Example 9-13. Application to find the maximum temperature by station, showing station
names from a lookup table passed as a distributed cache file
public class MaxTemperatureByStationNameUsingDistributedCacheFile
extends Configured implements Tool {
static class StationTemperatureMapper
extends Mapper < LongWritable , Text , Text , IntWritable > {
private NcdcRecordParser parser = new NcdcRecordParser ();
@Override
protected void map ( LongWritable key , Text value , Context context )
throws IOException , InterruptedException {
parser . parse ( value );
if ( parser . isValidTemperature ()) {
context . write ( new Text ( parser . getStationId ()),
new IntWritable ( parser . getAirTemperature ()));
}
}
}
static class MaxTemperatureReducerWithStationLookup
Search WWH ::




Custom Search