Database Reference
In-Depth Information
data, it will gracefully ignore the new field and carry on processing as it would have done
with old data.
Avro specifies an object container format for sequences of objects, similar to Hadoop's
sequence file. An Avro datafile has a metadata section where the schema is stored, which
makes the file self-describing. Avro datafiles support compression and are splittable,
which is crucial for a MapReduce data input format. In fact, support goes beyond MapRe-
duce: all of the data processing frameworks in this topic (Pig, Hive, Crunch, Spark) can
read and write Avro datafiles.
Avro can be used for RPC, too, although this isn't covered here. More information is in
the specification.
Search WWH ::




Custom Search