Searching - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

introduced in Section 4.4. However, each has different performance characteristics

that make it the method of choice in particular circumstances.

The current chapter considers methods for searching data stored in lists. List in

this context means any list implementation including a linked list or an array. Most

of these methods are appropriate for sequences (i.e., duplicate key values are al-

lowed), although special techniques applicable to sets are discussed in Section 9.3.

The techniques from the first three sections of this chapter are most appropriate for

searching a collection of records stored in RAM. Section 9.4 discusses hashing,

a technique for organizing data in an array such that the location of each record

within the array is a function of its key value. Hashing is appropriate when records

are stored either in RAM or on disk.

Chapter 10 discusses tree-based methods for organizing information on disk,

including a commonly used file structure called the B-tree. Nearly all programs that

must organize large collections of records stored on disk use some variant of either

hashing or the B-tree. Hashing is practical for only certain access functions (exact-

match queries) and is generally appropriate only when duplicate key values are

not allowed. B-trees are the method of choice for dynamic disk-based applications

anytime hashing is not appropriate.

9.1

Searching Unsorted and Sorted Arrays

The simplest form of search has already been presented in Example 3.1: the se-

quential search algorithm. Sequential search on an unsorted list requires (n) time

in the worst case.

How many comparisons does linear search do on average? A major consid-

eration is whether K is in list L at all. We can simplify our analysis by ignoring

everything about the input except the position of K if it is found in L. Thus, we have

n + 1 distinct possible events: That K is in one of positions 0 to n 1 in L (each

position having its own probability), or that it is not in L at all. We can express the

probability that K is not in L as

n X

P(K 2 L) = 1

P(K = L[i])

i=1

where P(x) is the probability of event x.

Let p i be the probability that K is in position i of L (indexed from 0 to n 1.

For any position i in the list, we must look at i + 1 records to reach it. So we say

that the cost when K is in position i is i + 1. When K is not in L, sequential search

will require n comparisons. Let p n be the probability that K is not in L. Then the

average cost T(n) will be

Search WWH ::

Custom Search

Home