Java Reference
In-Depth Information
read(byte[]b) : Read some bytes from the current position in the file.
The current position moves forward as the bytes are read.
write(byte[]b) : Write some bytes at the current position in the file
(overwriting the bytes already at that position). The current position moves
forward as the bytes are written.
seek(longpos) : Move the current position in the file to pos .
This
allows bytes at arbitrary places within the file to be read or written.
close() : Close a file at the end of processing.
8.5
External Sorting
We now consider the problem of sorting collections of records too large to fit in
main memory. Because the records must reside in peripheral or external memory,
such sorting methods are called external sorts. This is in contrast to the internal
sorts discussed in Chapter 7 which assume that the records to be sorted are stored in
main memory. Sorting large collections of records is central to many applications,
such as processing payrolls and other large business databases. As a consequence,
many external sorting algorithms have been devised. Years ago, sorting algorithm
designers sought to optimize the use of specific hardware configurations, such as
multiple tape or disk drives. Most computing today is done on personal computers
and low-end workstations with relatively powerful CPUs, but only one or at most
two disk drives. The techniques presented here are geared toward optimized pro-
cessing on a single disk drive. This approach allows us to cover the most important
issues in external sorting while skipping many less important machine-dependent
details. Readers who have a need to implement efficient external sorting algorithms
that take advantage of more sophisticated hardware configurations should consult
the references in Section 8.6.
When a collection of records is too large to fit in main memory, the only prac-
tical way to sort it is to read some records from disk, do some rearranging, then
write them back to disk. This process is repeated until the file is sorted, with each
record read perhaps many times. Given the high cost of disk I/O, it should come as
no surprise that the primary goal of an external sorting algorithm is to minimize the
number of times information must be read from or written to disk. A certain amount
of additional CPU processing can profitably be traded for reduced disk access.
Before discussing external sorting techniques, consider again the basic model
for accessing information from disk. The file to be sorted is viewed by the program-
mer as a sequential series of fixed-size blocks. Assume (for simplicity) that each
block contains the same number of fixed-size data records. Depending on the ap-
plication, a record might be only a few bytes — composed of little or nothing more
than the key — or might be hundreds of bytes with a relatively small key field.
Records are assumed not to cross block boundaries.
These assumptions can be
Search WWH ::




Custom Search