Indexing - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

10

Indexing

Many large-scale computing applications are centered around data sets that are too

large to fit into main memory. The classic example is a large database of records

with multiple search keys, requiring the ability to insert, delete, and search for

records. Hashing provides outstanding performance for such situations, but only

in the limited case in which all searches are of the form “find the record with key

value K.” Many applications require more general search capabilities. One exam-

ple is a range query search for all records whose key lies within some range. Other

queries might involve visiting all records in order of their key value, or finding the

record with the greatest key value. Hash tables are not organized to support any of

these queries efficiently.

This chapter introduces file structures used to organize a large collection of

records stored on disk. Such file structures support efficient insertion, deletion, and

search operations, for exact-match queries, range queries, and largest/smallest key

value searches.

Before discussing such file structures, we must become familiar with some ba-

sic file-processing terminology. An entry-sequenced file stores records in the order

that they were added to the file. Entry-sequenced files are the disk-based equivalent

to an unsorted list and so do not support efficient search. The natural solution is to

sort the records by order of the search key. However, a typical database, such as a

collection of employee or customer records maintained by a business, might con-

tain multiple search keys. To answer a question about a particular customer might

require a search on the name of the customer. Businesses often wish to sort and

output the records by zip code order for a bulk mailing. Government paperwork

might require the ability to search by Social Security number. Thus, there might

not be a single “correct” order in which to store the records.

Indexing is the process of associating a key with the location of a correspond-

ing data record. Section 8.5 discussed the concept of a key sort, in which an index

file is created whose records consist of key/pointer pairs. Here, each key is asso-

ciated with a pointer to a complete record in the main database file. The index file

341

Search WWH ::

Custom Search

Home