Database Reference
In-Depth Information
tly on the byte streams. [ 83 ] In the case of the original StringPair schema (with no or-
der attributes), for example, Avro implements the binary comparison as follows.
The first field, left , is a UTF-8-encoded string, for which Avro can compare the bytes
lexicographically. If they differ, the order is determined, and Avro can stop the comparis-
on there. Otherwise, if the two byte sequences are the same, it compares the second two
( right ) fields, again lexicographically at the byte level because the field is another
UTF-8 string.
Notice that this description of a comparison function has exactly the same logic as the bin-
ary comparator we wrote for Writables in Implementing a RawComparator for speed . The
great thing is that Avro provides the comparator for us, so we don't have to write and
maintain this code. It's also easy to change the sort order just by changing the reader's
schema. For the SortedStringPair.avsc and SwitchedStringPair.avsc schemas, the compar-
ison function Avro uses is essentially the same as the one just described. The differences
are which fields are considered, the order in which they are considered, and whether the
sort order is ascending or descending.
Later in the chapter, we'll use Avro's sorting logic in conjunction with MapReduce to sort
Avro datafiles in parallel.
Search WWH ::




Custom Search