Java Reference
In-Depth Information
relaxed for special-purpose sorting applications, but ignoring such complications
makes the principles clearer.
As explained in Section 8.2, a sector is the basic unit of I/O. In other words,
all disk reads and writes are for one or more complete sectors. Sector sizes are
typically a power of two, in the range 512 to 16K bytes, depending on the operating
system and the size and speed of the disk drive. The block size used for external
sorting algorithms should be equal to or a multiple of the sector size.
Under this model, a sorting algorithm reads a block of data into a buffer in main
memory, performs some processing on it, and at some future time writes it back to
disk. From Section 8.1 we see that reading or writing a block from disk takes on
the order of one million times longer than a memory access. Based on this fact, we
can reasonably expect that the records contained in a single block can be sorted by
an internal sorting algorithm such as Quicksort in less time than is required to read
or write the block.
Under good conditions, reading from a file in sequential order is more efficient
than reading blocks in random order. Given the significant impact of seek time on
disk access, it might seem obvious that sequential processing is faster. However,
it is important to understand precisely under what circumstances sequential file
processing is actually faster than random access, because it affects our approach to
designing an external sorting algorithm.
Efficient sequential access relies on seek time being kept to a minimum. The
first requirement is that the blocks making up a file are in fact stored on disk in
sequential order and close together, preferably filling a small number of contiguous
tracks. At the very least, the number of extents making up the file should be small.
Users typically do not have much control over the layout of their file on disk, but
writing a file all at once in sequential order to a disk drive with a high percentage
of free space increases the likelihood of such an arrangement.
The second requirement is that the disk drive's I/O head remain positioned
over the file throughout sequential processing. This will not happen if there is
competition of any kind for the I/O head. For example, on a multi-user time-shared
computer the sorting process might compete for the I/O head with the processes
of other users. Even when the sorting process has sole control of the I/O head, it
is still likely that sequential processing will not be efficient. Imagine the situation
where all processing is done on a single disk drive, with the typical arrangement
of a single bank of read/write heads that move together over a stack of platters. If
the sorting process involves reading from an input file, alternated with writing to an
output file, then the I/O head will continuously seek between the input file and the
output file. Similarly, if two input files are being processed simultaneously (such
as during a merge process), then the I/O head will continuously seek between these
two files.
Search WWH ::




Custom Search