Algorithm analysis - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

In the worst case, where data is not uniformly distributed, the running

time could be linear and every item might be examined. In Exercise 5.25 you

are asked to construct such a case. However, if we assume that the items are

reasonably distributed, as with a phone book, the average number of compari-

sons has been shown to be . In other words, we apply the loga-

rithm twice in succession. For N = 4 billion, is about 32 and

is roughly 5. Of course, there are some hidden constants in the Big-Oh nota-

tion, but the extra logarithm can lower the number of iterations considerably,

so long as a bad case does not crop up. Proving the result rigorously, however,

is quite complicated.

Interpolation search

has a better Big-Oh

bound on average

than does binary

search, but has lim-

ited practicality and

a bad worst case.

O

(

log

N

)

log N

log

log N

checking an algorithm analysis

5.7

Once we have performed an algorithm analysis, we want to determine

whether it is correct and as good as we can possibly make it. One way to do

this is to code the program and see if the empirically observed running time

matches the running time predicted by the analysis.

When N increases by a factor of 10, the running time goes up by a factor

of 10 for linear programs, 100 for quadratic programs, and 1,000 for cubic

programs. Programs that run in take slightly more than 10 times

as long to run under the same circumstances. These increases can be hard to

spot if the lower-order terms have relatively large coefficients and N is not

large enough. An example is the jump from N = 10 to N = 100 in the running

time for the various implementations of the maximum contiguous subse-

quence sum problem. Differentiating linear programs from

O ( N log N )

pro-

grams, based purely on empirical evidence, also can be very difficult.

Another commonly used trick to verify that some program is

is to compute the values for a range of N (usually spaced out by

factors of 2), where is the empirically observed running time. If

is a tight answer for the running time, then the computed values converge to a

positive constant. If is an overestimate, the values converge to zero. If

is an underestimate, and hence wrong, the values diverge.

As an example, suppose that we write a program to perform N random

searches using the binary search algorithm. Since each search is logarithmic,

we expect the total running time of the program to be .

Figure 5.13 shows the actual observed running time for the routine for various

input sizes on a real (but extremely slow) computer. The last column is most

likely the converging column and thus confirms our analysis, whereas the

increasing numbers for

OF ( )

(

)

TN

()

⁄

F ( )

T ( )

F ( )

O ( N log N )

TN

⁄

suggest that

O ( N )

is an underestimate, and the

quickly decreasing values for

TN 2

⁄

suggest that

ON ( )

is an overestimate.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home