Database Reference
In-Depth Information
The format for record compression is almost identical to that for no compression, except
the value bytes are compressed using the codec defined in the header. Note that keys are
not compressed.
Block compression ( Figure 5-3 ) compresses multiple records at once; it is therefore more
compact than and should generally be preferred over record compression because it has
the opportunity to take advantage of similarities between records. Records are added to a
block until it reaches a minimum size in bytes, defined by the
io.seqfile.compress.blocksize property; the default is one million bytes. A
sync marker is written before the start of every block. The format of a block is a field in-
dicating the number of records in the block, followed by four compressed fields: the key
lengths, the keys, the value lengths, and the values.
Figure 5-3. The internal structure of a sequence file with block compression
MapFile
A MapFile is a sorted SequenceFile with an index to permit lookups by key. The in-
dex is itself a SequenceFile that contains a fraction of the keys in the map (every
128th key, by default). The idea is that the index can be loaded into memory to provide
fast lookups from the main data file, which is another SequenceFile containing all the
map entries in sorted key order.
MapFile offers a very similar interface to SequenceFile for reading and writing —
the main thing to be aware of is that when writing using MapFile.Writer , map entries
must be added in order, otherwise an IOException will be thrown.
Search WWH ::




Custom Search