Database Reference
In-Depth Information
Text objects it can store, TextPair differs from TextArrayWritable (which we
discussed in the previous section), since TextArrayWritable is only a Writable ,
not a WritableComparable .
Implementing a RawComparator for speed
The code for TextPair in Example 5-7 will work as it stands; however, there is a fur-
ther optimization we can make. As explained in WritableComparable and comparators ,
when TextPair is being used as a key in MapReduce, it will have to be deserialized in-
to an object for the compareTo() method to be invoked. What if it were possible to
compare two TextPair objects just by looking at their serialized representations?
It turns out that we can do this because TextPair is the concatenation of two Text ob-
jects, and the binary representation of a Text object is a variable-length integer contain-
ing the number of bytes in the UTF-8 representation of the string, followed by the UTF-8
bytes themselves. The trick is to read the initial length so we know how long the first
Text object's byte representation is; then we can delegate to Text 's RawComparator
and invoke it with the appropriate offsets for the first or second string. Example 5-8 gives
the details (note that this code is nested in the TextPair class).
Example 5-8. A RawComparator for comparing TextPair byte representations
public static class Comparator extends WritableComparator {
private static final Text . Comparator TEXT_COMPARATOR = new
Text . Comparator ();
public Comparator () {
super ( TextPair . class );
}
@Override
public int compare ( byte [] b1 , int s1 , int l1 ,
byte [] b2 , int s2 , int l2 ) {
try {
int firstL1 = WritableUtils . decodeVIntSize ( b1 [ s1 ]) +
readVInt ( b1 , s1 );
int firstL2 = WritableUtils . decodeVIntSize ( b2 [ s2 ]) +
readVInt ( b2 , s2 );
int cmp = TEXT_COMPARATOR . compare ( b1 , s1 , firstL1 , b2 , s2 ,
firstL2 );
if ( cmp != 0 ) {
return cmp ;
Search WWH ::




Custom Search