Java Reference
In-Depth Information
The moral is that, with a single disk drive, there often is no such thing as effi-
cient sequential processing of a data file. Thus, a sorting algorithm might be more
efficient if it performs a smaller number of non-sequential disk operations rather
than a larger number of logically sequential disk operations that require a large
number of seeks in practice.
As mentioned previously, the record size might be quite large compared to the
size of the key. For example, payroll entries for a large business might each store
hundreds of bytes of information including the name, ID, address, and job title for
each employee. The sort key might be the ID number, requiring only a few bytes.
The simplest sorting algorithm might be to process such records as a whole, reading
the entire record whenever it is processed. However, this will greatly increase the
amount of I/O required, because only a relatively few records will fit into a single
disk block. Another alternative is to do a key sort. Under this method, the keys are
all read and stored together in an index file, where each key is stored along with a
pointer indicating the position of the corresponding record in the original data file.
The key and pointer combination should be substantially smaller than the size of
the original record; thus, the index file will be much smaller than the complete data
file. The index file will then be sorted, requiring much less I/O because the index
records are smaller than the complete records.
Once the index file is sorted, it is possible to reorder the records in the original
database file. This is typically not done for two reasons. First, reading the records
in sorted order from the record file requires a random access for each record. This
can take a substantial amount of time and is only of value if the complete collection
of records needs to be viewed or processed in sorted order (as opposed to a search
for selected records). Second, database systems typically allow searches to be done
on multiple keys. For example, today's processing might be done in order of ID
numbers. Tomorrow, the boss might want information sorted by salary. Thus, there
might be no single “sorted” order for the full record. Instead, multiple index files
are often maintained, one for each sort key.
These ideas are explored further in
Chapter 10.
8.5.1
Simple Approaches to External Sorting
If your operating system supports virtual memory, the simplest “external” sort is
to read the entire file into virtual memory and run an internal sorting method such
as Quicksort. This approach allows the virtual memory manager to use its normal
buffer pool mechanism to control disk accesses. Unfortunately, this might not al-
ways be a viable option. One potential drawback is that the size of virtual memory
is usually limited to something much smaller than the disk space available. Thus,
your input file might not fit into virtual memory. Limited virtual memory can be
overcome by adapting an internal sorting method to make use of your own buffer
pool.
Search WWH ::




Custom Search