Database Reference
In-Depth Information
der
attributes), for example, Avro implements the binary comparison as follows.
The first field,
left
, is a UTF-8-encoded string, for which Avro can compare the bytes
lexicographically. If they differ, the order is determined, and Avro can stop the comparis-
on there. Otherwise, if the two byte sequences are the same, it compares the second two
(
right
) fields, again lexicographically at the byte level because the field is another
UTF-8 string.
Notice that this description of a comparison function has exactly the same logic as the bin-
ary comparator we wrote for Writables in
Implementing a RawComparator for speed
. The
great thing is that Avro provides the comparator for us, so we don't have to write and
maintain this code. It's also easy to change the sort order just by changing the reader's
schema. For the
SortedStringPair.avsc
and
SwitchedStringPair.avsc
schemas, the compar-
ison function Avro uses is essentially the same as the one just described. The differences
are which fields are considered, the order in which they are considered, and whether the
sort order is ascending or descending.
Later in the chapter, we'll use Avro's sorting logic in conjunction with MapReduce to sort
Avro datafiles in parallel.