Database Reference
In-Depth Information
Comparable
<
T
> {
}
Comparison of types is crucial for MapReduce, where there is a sorting phase during
which keys are compared with one another. One optimization that Hadoop provides is the
RawComparator
extension of Java's
Comparator
:
package
org
.
apache
.
hadoop
.
io
;
import
java.util.Comparator
;
public interface
RawComparator
<
T
>
extends
Comparator
<
T
> {
public
int
compare
(
byte
[]
b1
,
int
s1
,
int
l1
,
byte
[]
b2
,
int
s2
,
int
l2
);
}
This interface permits implementors to compare records read from a stream without
deserializing them into objects, thereby avoiding any overhead of object creation. For ex-
ample, the comparator for
IntWritable
s implements the raw
compare()
method by
reading an integer from each of the byte arrays
b1
and
b2
and comparing them directly
from the given start positions (
s1
and
s2
) and lengths (
l1
and
l2
).
WritableComparator
is a general-purpose implementation of
RawComparator
for
WritableComparable
classes. It provides two main functions. First, it provides a de-
fault implementation of the raw
compare()
method that deserializes the objects to be
compared from the stream and invokes the object
compare()
method. Second, it acts as
a factory for
RawComparator
instances (that
Writable
implementations have re-
gistered). For example, to obtain a comparator for
IntWritable
, we just use:
RawComparator
<
IntWritable
>
comparator
=
WritableComparator
.
get
(
IntWritable
.
class
);
The comparator can be used to compare two
IntWritable
objects:
IntWritable w1
=
new
IntWritable
(
163
);
IntWritable w2
=
new
IntWritable
(
67
);
assertThat
(
comparator
.
compare
(
w1
,
w2
),
greaterThan
(
0
));
or their serialized representations:
byte
[]
b1
=
serialize
(
w1
);
byte
[]
b2
=
serialize
(
w2
);
assertThat
(
comparator
.
compare
(
b1
,
0
,
b1
.
length
,
b2
,
0
,