Database Reference
In-Depth Information
Text
objects it can store,
TextPair
differs from
TextArrayWritable
(which we
discussed in the previous section), since
TextArrayWritable
is only a
Writable
,
not a
WritableComparable
.
Implementing a RawComparator for speed
ther optimization we can make. As explained in
WritableComparable and comparators
,
when
TextPair
is being used as a key in MapReduce, it will have to be deserialized in-
to an object for the
compareTo()
method to be invoked. What if it were possible to
compare two
TextPair
objects just by looking at their serialized representations?
It turns out that we can do this because
TextPair
is the concatenation of two
Text
ob-
jects, and the binary representation of a
Text
object is a variable-length integer contain-
ing the number of bytes in the UTF-8 representation of the string, followed by the UTF-8
bytes themselves. The trick is to read the initial length so we know how long the first
Text
object's byte representation is; then we can delegate to
Text
's
RawComparator
and invoke it with the appropriate offsets for the first or second string.
Example 5-8
gives
the details (note that this code is nested in the
TextPair
class).
Example 5-8. A RawComparator for comparing TextPair byte representations
public static class
Comparator
extends
WritableComparator
{
private static final
Text
.
Comparator
TEXT_COMPARATOR
=
new
Text
.
Comparator
();
public
Comparator
() {
super
(
TextPair
.
class
);
}
@Override
public
int
compare
(
byte
[]
b1
,
int
s1
,
int
l1
,
byte
[]
b2
,
int
s2
,
int
l2
) {
try
{
int
firstL1
=
WritableUtils
.
decodeVIntSize
(
b1
[
s1
]) +
readVInt
(
b1
,
s1
);
int
firstL2
=
WritableUtils
.
decodeVIntSize
(
b2
[
s2
]) +
readVInt
(
b2
,
s2
);
int
cmp
=
TEXT_COMPARATOR
.
compare
(
b1
,
s1
,
firstL1
,
b2
,
s2
,
firstL2
);
if
(
cmp
!=
0
) {
return
cmp
;