Database Reference
In-Depth Information
There are many ways to implement indexes, and we shall not attempt to survey the mat-
ter here. The bibliographic notes give suggestions for further reading. However, a hash
table is one simple way to build an index. The field or fields on which the index is based
form the hash-key for a hash function. Records have the hash function applied to value of
the hash-key, and the record itself is placed in the bucket whose number is determined by
the hash function. The bucket could be a list of records in main-memory, or a disk block,
for example.
Then, given a hash-key value, we can hash it, find the bucket, and need to search only
that bucket to find the records with that value for the hash-key. If we choose the number of
buckets B to be comparable to the number of records in the file, then there will be relatively
few records in any bucket, and the search of a bucket takes little time.
EXAMPLE 1.5 Figure 1.2 suggests what a main-memory index of records with name, ad-
dress, and phone fields might look like. Here, the index is on the phone field, and buckets
are linked lists. We show the phone 800-555-1212 hashed to bucket number 17. There is
an array of bucket headers , whose i th element is the head of a linked list for the bucket
numbered i . We show expanded one of the elements of the linked list. It contains a record
with name, address, and phone fields. This record is in fact one with the phone number
800-555-1212. Other records in that bucket may or may not have this phone number. We
only know that whatever phone number they have is a phone that hashes to 17.
Figure 1.2 A hash table used as an index; phone numbers are hashed to buckets, and the entire record is placed in the
bucket whose number is the hash value of the phone
1.3.4
Secondary Storage
It is important, when dealing with large-scale data, that we have a good understanding of
the difference in time taken to perform computations when the data is initially on disk, as
opposed to the time needed if the data is initially in main memory. The physical character-
istics of disks is another subject on which we could say much, but shall say only a little and
leave the interested reader to follow the bibliographic notes.
Disks are organized into blocks , which are the minimum units that the operating system
uses to move data between main memory and disk. For example, the Windows operating
Search WWH ::




Custom Search