Searching - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

value in L greater than K, we do a linear search on the piece of length j 1 that

we know brackets K if it is in the list.

If we define m such that mj n < (m + 1)j, then the total cost of this

algorithm is at most m + j 1 3-way comparisons. (They are 3-way because at

each comparison of K with some L[i] we need to know if K is less than, equal to,

or greater than L[i].) Therefore, the cost to run the algorithm on n items with a

jump of size j is

n

j

T(n;j) = m + j 1 =

+ j 1:

What is the best value that we can pick for j? We want to minimize the cost:

n

j

min

1jn

+ j 1

Ta ke the derivative and solve for f 0 (j) = 0 to find th e minimum, which is

j = p n. In this case, the worst case cost will be roughly 2 p n.

This example invokes a basic principle of algorithm design. We want to bal-

ance the work done while selecting a sublist with the work done while searching a

sublist. In general, it is a good strategy to make subproblems of equal effort. This

is an example of a divide and conquer algorithm.

What if we extend this idea to three levels? We would first make jumps of

some size j to find a sublist of size j 1 whose end values bracket value K. We

would then work through this sublist by making jumps of some smaller size, say

j 1 . Finally, once we find a bracketed sublist of size j 1 1, we would do sequential

search to complete the process.

This probably sounds convoluted to do two levels of jumping to be followed by

a sequential search. While it might make sense to do a two-level algorithm (that is,

jump search jumps to find a sublist and then does sequential search on the sublist),

it almost never seems to make sense to do a three-level algorithm. Instead, when

we go beyond two levels, we nearly always generalize by using recursion. This

leads us to the most commonly used search algorithm for sorted arrays, the binary

search described in Section 3.5.

If we know nothing about the distribution of key values, then binary search

is the best algorithm available for searching a sorted array (see Exercise 9.22).

However, sometimes we do know something about the expected key distribution.

Consider the typical behavior of a person looking up a word in a large dictionary.

Most people certainly do not use sequential search! Typically, people use a mod-

ified form of binary search, at least until they get close to the word that they are

looking for. The search generally does not start at the middle of the dictionary. A

person looking for a word starting with 'S' generally assumes that entries beginning

with 'S' start about three quarters of the way through the dictionary. Thus, he or

Data Structures and Algorithm Analysis

Search WWH ::

Custom Search

Home