Hadoop I/O - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Comparable < T > {

}

Comparison of types is crucial for MapReduce, where there is a sorting phase during

which keys are compared with one another. One optimization that Hadoop provides is the

RawComparator extension of Java's Comparator :

package org . apache . hadoop . io ;

import java.util.Comparator ;

public interface RawComparator < T > extends Comparator < T > {

public int compare ( byte [] b1 , int s1 , int l1 , byte [] b2 , int s2 ,

int l2 );

}

This interface permits implementors to compare records read from a stream without

deserializing them into objects, thereby avoiding any overhead of object creation. For ex-

ample, the comparator for IntWritable s implements the raw compare() method by

reading an integer from each of the byte arrays b1 and b2 and comparing them directly

from the given start positions ( s1 and s2 ) and lengths ( l1 and l2 ).

WritableComparator is a general-purpose implementation of RawComparator for

WritableComparable classes. It provides two main functions. First, it provides a de-

fault implementation of the raw compare() method that deserializes the objects to be

compared from the stream and invokes the object compare() method. Second, it acts as

a factory for RawComparator instances (that Writable implementations have re-

gistered). For example, to obtain a comparator for IntWritable , we just use:

RawComparator < IntWritable > comparator =

WritableComparator . get ( IntWritable . class );

The comparator can be used to compare two IntWritable objects:

IntWritable w1 = new IntWritable ( 163 );

IntWritable w2 = new IntWritable ( 67 );

assertThat ( comparator . compare ( w1 , w2 ), greaterThan ( 0 ));

or their serialized representations:

byte [] b1 = serialize ( w1 );

byte [] b2 = serialize ( w2 );

assertThat ( comparator . compare ( b1 , 0 , b1 . length , b2 , 0 ,

Search WWH ::

Custom Search

Home