Java Reference
In-Depth Information
introduced in Section 4.4. However, each has different performance characteristics
that make it the method of choice in particular circumstances.
The current chapter considers methods for searching data stored in lists. List in
this context means any list implementation including a linked list or an array. Most
of these methods are appropriate for sequences (i.e., duplicate key values are al-
lowed), although special techniques applicable to sets are discussed in Section 9.3.
The techniques from the first three sections of this chapter are most appropriate for
searching a collection of records stored in RAM. Section 9.4 discusses hashing,
a technique for organizing data in an array such that the location of each record
within the array is a function of its key value. Hashing is appropriate when records
are stored either in RAM or on disk.
Chapter 10 discusses tree-based methods for organizing information on disk,
including a commonly used file structure called the B-tree. Nearly all programs that
must organize large collections of records stored on disk use some variant of either
hashing or the B-tree. Hashing is practical for only certain access functions (exact-
match queries) and is generally appropriate only when duplicate key values are
not allowed. B-trees are the method of choice for dynamic disk-based applications
anytime hashing is not appropriate.
9.1
Searching Unsorted and Sorted Arrays
The simplest form of search has already been presented in Example 3.1: the se-
quential search algorithm. Sequential search on an unsorted list requires (n) time
in the worst case.
How many comparisons does linear search do on average? A major consid-
eration is whether K is in list L at all. We can simplify our analysis by ignoring
everything about the input except the position of K if it is found in L. Thus, we have
n + 1 distinct possible events: That K is in one of positions 0 to n 1 in L (each
position having its own probability), or that it is not in L at all. We can express the
probability that K is not in L as
n X
P(K 2 L) = 1
P(K = L[i])
i=1
where P(x) is the probability of event x.
Let p i be the probability that K is in position i of L (indexed from 0 to n 1.
For any position i in the list, we must look at i + 1 records to reach it. So we say
that the cost when K is in position i is i + 1. When K is not in L, sequential search
will require n comparisons. Let p n be the probability that K is not in L. Then the
average cost T(n) will be
Search WWH ::




Custom Search