Database Reference
In-Depth Information
a special entry to mark the sync point every few records as a sequence file is being writ-
ten. Such entries are small enough to incur only a modest storage overhead — less than
1%. Sync points always align with record boundaries.
Running the program in Example 5-11 shows the sync points in the sequence file as aster-
isks. The first one occurs at position 2021 (the second one occurs at position 4075, but is
not shown in the output):
% hadoop SequenceFileReadDemo numbers.seq
[128] 100 One, two, buckle my shoe
[173] 99 Three, four, shut the door
[220] 98 Five, six, pick up sticks
[264] 97 Seven, eight, lay them straight
[314] 96 Nine, ten, a big fat hen
[359] 95 One, two, buckle my shoe
[404] 94 Three, four, shut the door
[451] 93 Five, six, pick up sticks
[495] 92 Seven, eight, lay them straight
[545] 91 Nine, ten, a big fat hen
[590] 90 One, two, buckle my shoe
...
[1976] 60 One, two, buckle my shoe
[2021*] 59 Three, four, shut the door
[2088] 58 Five, six, pick up sticks
[2132] 57 Seven, eight, lay them straight
[2182] 56 Nine, ten, a big fat hen
...
[4557] 5 One, two, buckle my shoe
[4602] 4 Three, four, shut the door
[4649] 3 Five, six, pick up sticks
[4693] 2 Seven, eight, lay them straight
[4743] 1 Nine, ten, a big fat hen
There are two ways to seek to a given position in a sequence file. The first is the seek()
method, which positions the reader at the given point in the file. For example, seeking to a
record boundary works as expected:
reader . seek ( 359 );
assertThat ( reader . next ( key , value ), is ( true ));
assertThat ((( IntWritable ) key ). get (), is ( 95 ));
But if the position in the file is not at a record boundary, the reader fails when the
next() method is called:
reader . seek ( 360 );
reader . next ( key , value ); // fails with IOException
Search WWH ::




Custom Search