Java Reference
In-Depth Information
10
Indexing
Many large-scale computing applications are centered around data sets that are too
large to fit into main memory. The classic example is a large database of records
with multiple search keys, requiring the ability to insert, delete, and search for
records. Hashing provides outstanding performance for such situations, but only
in the limited case in which all searches are of the form “find the record with key
value K.” Many applications require more general search capabilities. One exam-
ple is a range query search for all records whose key lies within some range. Other
queries might involve visiting all records in order of their key value, or finding the
record with the greatest key value. Hash tables are not organized to support any of
these queries efficiently.
This chapter introduces file structures used to organize a large collection of
records stored on disk. Such file structures support efficient insertion, deletion, and
search operations, for exact-match queries, range queries, and largest/smallest key
value searches.
Before discussing such file structures, we must become familiar with some ba-
sic file-processing terminology. An entry-sequenced file stores records in the order
that they were added to the file. Entry-sequenced files are the disk-based equivalent
to an unsorted list and so do not support efficient search. The natural solution is to
sort the records by order of the search key. However, a typical database, such as a
collection of employee or customer records maintained by a business, might con-
tain multiple search keys. To answer a question about a particular customer might
require a search on the name of the customer. Businesses often wish to sort and
output the records by zip code order for a bulk mailing. Government paperwork
might require the ability to search by Social Security number. Thus, there might
not be a single “correct” order in which to store the records.
Indexing is the process of associating a key with the location of a correspond-
ing data record. Section 8.5 discussed the concept of a key sort, in which an index
file is created whose records consist of key/pointer pairs. Here, each key is asso-
ciated with a pointer to a complete record in the main database file. The index file
341
 
Search WWH ::




Custom Search