Database Reference
In-Depth Information
Comparable < T > {
}
Comparison of types is crucial for MapReduce, where there is a sorting phase during
which keys are compared with one another. One optimization that Hadoop provides is the
RawComparator extension of Java's Comparator :
package org . apache . hadoop . io ;
import java.util.Comparator ;
public interface RawComparator < T > extends Comparator < T > {
public int compare ( byte [] b1 , int s1 , int l1 , byte [] b2 , int s2 ,
int l2 );
}
This interface permits implementors to compare records read from a stream without
deserializing them into objects, thereby avoiding any overhead of object creation. For ex-
ample, the comparator for IntWritable s implements the raw compare() method by
reading an integer from each of the byte arrays b1 and b2 and comparing them directly
from the given start positions ( s1 and s2 ) and lengths ( l1 and l2 ).
WritableComparator is a general-purpose implementation of RawComparator for
WritableComparable classes. It provides two main functions. First, it provides a de-
fault implementation of the raw compare() method that deserializes the objects to be
compared from the stream and invokes the object compare() method. Second, it acts as
a factory for RawComparator instances (that Writable implementations have re-
gistered). For example, to obtain a comparator for IntWritable , we just use:
RawComparator < IntWritable > comparator =
WritableComparator . get ( IntWritable . class );
The comparator can be used to compare two IntWritable objects:
IntWritable w1 = new IntWritable ( 163 );
IntWritable w2 = new IntWritable ( 67 );
assertThat ( comparator . compare ( w1 , w2 ), greaterThan ( 0 ));
or their serialized representations:
byte [] b1 = serialize ( w1 );
byte [] b2 = serialize ( w2 );
assertThat ( comparator . compare ( b1 , 0 , b1 . length , b2 , 0 ,
Search WWH ::




Custom Search