Database Reference
In-Depth Information
(so we can transparently read data written in an older format), and interoperable (so we
can read or write persistent data using different languages).
Hadoop uses its own serialization format, Writables, which is certainly compact and fast,
but not so easy to extend or use from languages other than Java. Because Writables are
central to Hadoop (most MapReduce programs use them for their key and value types),
we look at them in some depth in the next three sections, before looking at some of the
other serialization frameworks supported in Hadoop. Avro (a serialization system that was
designed to overcome some of the limitations of Writables) is covered in
Chapter 12
.
The Writable Interface
The
Writable
interface defines two methods — one for writing its state to a
DataOutput
binary stream and one for reading its state from a
DataInput
binary
stream:
package
org
.
apache
.
hadoop
.
io
;
import
java.io.DataOutput
;
import
java.io.DataInput
;
import
java.io.IOException
;
public interface
Writable
{
void
write
(
DataOutput out
)
throws
IOException
;
void
readFields
(
DataInput in
)
throws
IOException
;
}
Let's look at a particular
Writable
to see what we can do with it. We will use
IntWritable
, a wrapper for a Java
int
. We can create one and set its value using the
set()
method:
IntWritable writable
=
new
IntWritable
();
writable
.
set
(
163
);
Equivalently, we can use the constructor that takes the integer value:
IntWritable writable
=
new
IntWritable
(
163
);
To examine the serialized form of the
IntWritable
, we write a small helper method
that wraps a
java.io.ByteArrayOutputStream
in a
java.io.DataOutputStream
(an implementation of
java.io.DataOutput
) to
capture the bytes in the serialized stream:
public static
byte
[]
serialize
(
Writable writable
)
throws
IOException
{