Database Reference
In-Depth Information
(so we can transparently read data written in an older format), and interoperable (so we
can read or write persistent data using different languages).
Hadoop uses its own serialization format, Writables, which is certainly compact and fast,
but not so easy to extend or use from languages other than Java. Because Writables are
central to Hadoop (most MapReduce programs use them for their key and value types),
we look at them in some depth in the next three sections, before looking at some of the
other serialization frameworks supported in Hadoop. Avro (a serialization system that was
designed to overcome some of the limitations of Writables) is covered in Chapter 12 .
The Writable Interface
The Writable interface defines two methods — one for writing its state to a
DataOutput binary stream and one for reading its state from a DataInput binary
stream:
package org . apache . hadoop . io ;
import java.io.DataOutput ;
import java.io.DataInput ;
import java.io.IOException ;
public interface Writable {
void write ( DataOutput out ) throws IOException ;
void readFields ( DataInput in ) throws IOException ;
}
Let's look at a particular Writable to see what we can do with it. We will use
IntWritable , a wrapper for a Java int . We can create one and set its value using the
set() method:
IntWritable writable = new IntWritable ();
writable . set ( 163 );
Equivalently, we can use the constructor that takes the integer value:
IntWritable writable = new IntWritable ( 163 );
To examine the serialized form of the IntWritable , we write a small helper method
that wraps a java.io.ByteArrayOutputStream in a
java.io.DataOutputStream (an implementation of java.io.DataOutput ) to
capture the bytes in the serialized stream:
public static byte [] serialize ( Writable writable ) throws
IOException {
Search WWH ::




Custom Search