Database Reference
In-Depth Information
A Serialization defines a mapping from types to Serializer instances (for turn-
ing an object into a byte stream) and Deserializer instances (for turning a byte
stream into an object).
Set the io.serializations property to a comma-separated list of classnames in or-
der to register Serialization implementations. Its default value includes
org.apache.hadoop.io.serializer.WritableSerialization and the
Avro Specific and Reflect serializations (see Avro Data Types and Schemas ), which
means that only Writable or Avro objects can be serialized or deserialized out of the
box.
Hadoop includes a class called JavaSerialization that uses Java Object Serializa-
tion. Although it makes it convenient to be able to use standard Java types such as In-
teger or String in MapReduce programs, Java Object Serialization is not as efficient
as Writables, so it's not worth making this trade-off (see the following sidebar).
WHY NOT USE JAVA OBJECT SERIALIZATION?
Java comes with its own serialization mechanism, called Java Object Serialization (often referred to
simply as “Java Serialization”), that is tightly integrated with the language, so it's natural to ask why this
wasn't used in Hadoop. Here's what Doug Cutting said in response to that question:
Why didn't I use Serialization when we first started Hadoop? Because it looked big and hairy and I
thoughtweneededsomethingleanandmean,wherewehadprecisecontroloverexactlyhowobjects
are written and read, since that is central to Hadoop. With Serialization you can get some control,
but you have to fight for it.
The logic for not using RMI [Remote Method Invocation] was similar. Effective, high-performance
inter-process communications are critical to Hadoop. I felt like we'd need to precisely control how
things like connections, timeouts and buffers are handled, and RMI gives you little control over
those.
The problem is that Java Serialization doesn't meet the criteria for a serialization format listed earlier:
compact, fast, extensible, and interoperable.
Serialization IDL
There are a number of other serialization frameworks that approach the problem in a dif-
ferent way: rather than defining types through code, you define them in a language-neut-
ral, declarative fashion, using an interface description language (IDL). The system can
then generate types for different languages, which is good for interoperability. They also
typically define versioning schemes that make type evolution straightforward.
Search WWH ::




Custom Search