Database Reference
In-Depth Information
Secondary sort
As described, the reducer will see the records from both sources that have the same
key, but they are not guaranteed to be in any particular order. However, to perform the
join, it is important to have the data from one source before that from the other. For the
weather data join, the station record must be the first of the values seen for each key, so
the reducer can fill in the weather records with the station name and emit them
straightaway. Of course, it would be possible to receive the records in any order if we
buffered them in memory, but this should be avoided because the number of records in
any group may be very large and exceed the amount of memory available to the redu-
cer.
We saw in Secondary Sort how to impose an order on the values for each key that the
reducers see, so we use this technique here.
To tag each record, we use TextPair (discussed in Chapter 5 ) for the keys (to store the
station ID) and the tag. The only requirement for the tag values is that they sort in such a
way that the station records come before the weather records. This can be achieved by
tagging station records as 0 and weather records as 1 . The mapper classes to do this are
shown in Examples 9-9 and 9-10 .
Example 9-9. Mapper for tagging station records for a reduce-side join
public class JoinStationMapper
extends Mapper < LongWritable , Text , TextPair , Text > {
private NcdcStationMetadataParser parser = new
NcdcStationMetadataParser ();
@Override
protected void map ( LongWritable key , Text value , Context context )
throws IOException , InterruptedException {
if ( parser . parse ( value )) {
context . write ( new TextPair ( parser . getStationId (), "0" ),
new Text ( parser . getStationName ()));
}
}
}
Example 9-10. Mapper for tagging weather records for a reduce-side join
public class JoinRecordMapper
extends Mapper < LongWritable , Text , TextPair , Text > {
private NcdcRecordParser parser = new NcdcRecordParser ();
@Override
protected void map ( LongWritable key , Text value , Context context )
Search WWH ::




Custom Search