Hadoop I/O - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

a special entry to mark the sync point every few records as a sequence file is being writ-

ten. Such entries are small enough to incur only a modest storage overhead — less than

1%. Sync points always align with record boundaries.

Running the program in Example 5-11 shows the sync points in the sequence file as aster-

isks. The first one occurs at position 2021 (the second one occurs at position 4075, but is

not shown in the output):

% hadoop SequenceFileReadDemo numbers.seq

[128] 100 One, two, buckle my shoe

[173] 99 Three, four, shut the door

[220] 98 Five, six, pick up sticks

[264] 97 Seven, eight, lay them straight

[314] 96 Nine, ten, a big fat hen

[359] 95 One, two, buckle my shoe

[404] 94 Three, four, shut the door

[451] 93 Five, six, pick up sticks

[495] 92 Seven, eight, lay them straight

[545] 91 Nine, ten, a big fat hen

[590] 90 One, two, buckle my shoe

...

[1976] 60 One, two, buckle my shoe

[2021*] 59 Three, four, shut the door

[2088] 58 Five, six, pick up sticks

[2132] 57 Seven, eight, lay them straight

[2182] 56 Nine, ten, a big fat hen

...

[4557] 5 One, two, buckle my shoe

[4602] 4 Three, four, shut the door

[4649] 3 Five, six, pick up sticks

[4693] 2 Seven, eight, lay them straight

[4743] 1 Nine, ten, a big fat hen

There are two ways to seek to a given position in a sequence file. The first is the seek()

method, which positions the reader at the given point in the file. For example, seeking to a

record boundary works as expected:

reader . seek ( 359 );

assertThat ( reader . next ( key , value ), is ( true ));

assertThat ((( IntWritable ) key ). get (), is ( 95 ));

But if the position in the file is not at a record boundary, the reader fails when the

next() method is called:

reader . seek ( 360 );

reader . next ( key , value ); // fails with IOException

Search WWH ::

Custom Search

Home