Database Reference
In-Depth Information
}
return TEXT_COMPARATOR . compare ( b1 , s1 + firstL1 , l1 - firstL1 ,
b2 , s2 + firstL2 , l2 - firstL2 );
} catch ( IOException e ) {
throw new IllegalArgumentException ( e );
}
}
}
static {
WritableComparator . define ( TextPair . class , new Comparator ());
}
We actually subclass WritableComparator rather than implementing RawCompar-
ator directly, since it provides some convenience methods and default implementations.
The subtle part of this code is calculating firstL1 and firstL2 , the lengths of the
first Text field in each byte stream. Each is made up of the length of the variable-length
integer (returned by decodeVIntSize() on WritableUtils ) and the value it is
encoding (returned by readVInt() ).
The static block registers the raw comparator so that whenever MapReduce sees the Tex-
tPair class, it knows to use the raw comparator as its default comparator.
Custom comparators
As you can see with TextPair , writing raw comparators takes some care because you
have to deal with details at the byte level. It is worth looking at some of the implementa-
tions of Writable in the org.apache.hadoop.io package for further ideas if you
need to write your own. The utility methods on WritableUtils are very handy, too.
Custom comparators should also be written to be RawComparator s, if possible. These
are comparators that implement a different sort order from the natural sort order defined
by the default comparator. Example 5-9 shows a comparator for TextPair , called
FirstComparator , that considers only the first string of the pair. Note that we over-
ride the compare() method that takes objects so both compare() methods have the
same semantics.
We will make use of this comparator in Chapter 9 , when we look at joins and secondary
sorting in MapReduce (see Joins ) .
Example 5-9. A custom RawComparator for comparing the first field of TextPair byte rep-
resentations
Search WWH ::




Custom Search