Randomization - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

random item and swap it with the item in position high . Take a third ran-

dom item and swap it with the item in the middle position. Then continue

as usual. As before, degenerate partitions are always possible, but they now

happen as a result of bad random numbers, not bad inputs.

Let us look at the differences between randomized and nonrandomized

algorithms. So far we have concentrated on nonrandomized algorithms.

When calculating their average running times, we assume that all inputs

are equally likely. This assumption does not hold, however, because nearly

sorted input, for instance, occurs much more often than is statistically

expected. This situation can cause problems for some algorithms, such as

quicksort. But when we use a randomized algorithm, the particular input is

no longer important. The random numbers are important, and we get an

expected running time, in which we average over all possible random num-

bers for any particular input. Using quickselect with random pivots (or a

shuffle preprocessing step) gives an O ( N ) expected time algorithm. That

is, for any input , including already sorted input, the running time is

expected to be O ( N ), based on the statistics of random numbers. On the

one hand an expected time bound is somewhat stronger than an average-

case time bound because the assumptions used to generate it are weaker

(random numbers versus random input) but it is weaker than the corre-

sponding worst-case time bound. On the other hand, in many instances

solutions that have good worst-case bounds frequently have extra overhead

built in to assure that the worst case does not occur. The O ( N ) worst-case

algorithm for selection, for example, is a marvelous theoretical result but

is not practical.

Randomized algorithms come in two basic forms. The first, as already

shown, always gives a correct answer but it could take a long time, depend-

ing on the luck of the random numbers. The second type is what we exam-

ine in the remainder of this chapter. Some randomized algorithms work in a

fixed amount of time but randomly make mistakes (presumably with low

probability), called false positives or false negatives. This technique is

commonly accepted in medicine. False positives and false negatives for

most tests are actually fairly common, and some tests have surprisingly

high error rates. Furthermore, for some tests the errors depend on the indi-

vidual, not random numbers, so repeating the test is certain to produce

another false result. In randomized algorithms we can rerun the test on the

same input using different random numbers. If we run a randomized algo-

rithm 10 times and get 10 positives—and if a single false positive is an

unlikely occurrence (say, 1 chance in 100)—the probability of 10 consecu-

tive false positives (1 chance in 100 10

Some randomized

algorithms work in

a fixed amount of

time but randomly

make mistakes

(presumably with

low probability).

These mistakes are

false positives or

false negatives .

or one hundred billion billion) is

essentially zero.

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home