Database Reference
In-Depth Information
}
return
TEXT_COMPARATOR
.
compare
(
b1
,
s1
+
firstL1
,
l1
-
firstL1
,
b2
,
s2
+
firstL2
,
l2
-
firstL2
);
}
catch
(
IOException e
) {
throw new
IllegalArgumentException
(
e
);
}
}
}
static
{
WritableComparator
.
define
(
TextPair
.
class
,
new
Comparator
());
}
We actually subclass
WritableComparator
rather than implementing
RawCompar-
ator
directly, since it provides some convenience methods and default implementations.
The subtle part of this code is calculating
firstL1
and
firstL2
, the lengths of the
first
Text
field in each byte stream. Each is made up of the length of the variable-length
integer (returned by
decodeVIntSize()
on
WritableUtils
) and the value it is
encoding (returned by
readVInt()
).
The static block registers the raw comparator so that whenever MapReduce sees the
Tex-
tPair
class, it knows to use the raw comparator as its default comparator.
Custom comparators
As you can see with
TextPair
, writing raw comparators takes some care because you
have to deal with details at the byte level. It is worth looking at some of the implementa-
tions of
Writable
in the
org.apache.hadoop.io
package for further ideas if you
need to write your own. The utility methods on
WritableUtils
are very handy, too.
Custom comparators should also be written to be
RawComparator
s, if possible. These
are comparators that implement a different sort order from the natural sort order defined
FirstComparator
, that considers only the first string of the pair. Note that we over-
ride the
compare()
method that takes objects so both
compare()
methods have the
same semantics.
We will make use of this comparator in
Chapter 9
, when we look at joins and secondary
Example 5-9. A custom RawComparator for comparing the first field of TextPair byte rep-
resentations