Database Reference
In-Depth Information
The second way to find a record boundary makes use of sync points. The
sync(long
position)
method on
SequenceFile.Reader
positions the reader at the next sync
point after
position
. (If there are no sync points in the file after this position, then the
reader will be positioned at the end of the file.) Thus, we can call
sync()
with any posi-
tion in the stream — not necessarily a record boundary — and the reader will reestablish
itself at the next sync point so reading can continue:
reader
.
sync
(
360
);
assertThat
(
reader
.
getPosition
(),
is
(
2021L
));
assertThat
(
reader
.
next
(
key
,
value
),
is
(
true
));
assertThat
(((
IntWritable
)
key
).
get
(),
is
(
59
));
WARNING
SequenceFile.Writer
has a method called
sync()
for inserting a sync point at the current posi-
tion in the stream. This is not to be confused with the
hsync()
method defined by the
Syncable
in-
terface for synchronizing buffers to the underlying device (see
Coherency Model
).
Sync points come into their own when using sequence files as input to MapReduce, since
they permit the files to be split and different portions to be processed independently by
separate map tasks (see
SequenceFileInputFormat
)
.
Displaying a SequenceFile with the command-line interface
The
hadoop fs
command has a
-text
option to display sequence files in textual form.
It looks at a file's magic number so that it can attempt to detect the type of the file and ap-
propriately convert it to text. It can recognize gzipped files, sequence files, and Avro data-
files; otherwise, it assumes the input is plain text.
For sequence files, this command is really useful only if the keys and values have mean-
ingful string representations (as defined by the
toString()
method). Also, if you have
your own key or value classes, you will need to make sure they are on Hadoop's
classpath.
Running it on the sequence file we created in the previous section gives the following out-
put:
%
hadoop fs -text numbers.seq | head
100 One, two, buckle my shoe
99 Three, four, shut the door
98 Five, six, pick up sticks
97 Seven, eight, lay them straight
96 Nine, ten, a big fat hen