Algorithms - Writing for Computer Science

Information Technology Reference

In-Depth Information

For algorithms that operate on static data structures, it may be appropriate to

consider the cost of creating that data structure. For example, binary search in a

sorted array takes O

(

log n

)

time, but O

(

n log n

)

time could be required to initially

sort the array.

Make sure that the domain of the analysis is clear, and be careful to analyze the

right component of the data. It would usually be appropriate, for example, to analyze

database algorithms as a function of the number of records, not of the length of

individual records. However, if record length can substantially vary then possibly it

too should be considered. For algorithms that apply arithmetic to integers it may be

appropriate to regard each arithmetic operation as having unit cost. On the other hand,

if the integers involved can be of arbitrary length (consider for example public-key

encryption algorithms that rely for privacy on the expense of prime factorization)

it is appropriate to regard the cost of the arithmetic operations as a function of the

number of bits in each integer. 2

Two subtle problems are that the dominant cost may change with scale, and that

the cost that is dominant in theory may never dominate in practice. For example, a

certain algorithm might require O

(

n log n

)

comparisons and O

(

)

disk accesses. In

principle the asymptotic cost of the algorithm is O

, but, given that a disk

access may require 5 ms and a comparison less than a nanosecond, in practice the

cost of the disk accesses might well dominate for any conceivable application.

Some authors misunderstand the logic of asymptotic claims. For example,

Amdahl's law states that the lower bound for the time taken for an algorithm to

complete is determined by the part of the algorithm that is inherently sequential. The

remainder can be executed in parallel and hence time for this part can be reduced

by addition of processors, but no increase in the number of processors can affect the

lower bound. However, it has been claimed that Amdahl's law was broken, in the

context of a certain algorithm, by increasing both the size of the input data and the

number of processors. These changes had minimal impact on the sequential part of

the algorithm, so that the proportion of total processing time spent in the sequential

part was reduced; but this result does not contradict Amdahl's law, and so the claim

was false.

Another fallacious claim was that, for a certain indexing technique, the time

required to find matches to a pattern in a database was asymptotically sublinear in

the database size—a remarkable result, because the probability that a record is a

match to a given pattern is fixed, so that in the limit the number of matches must be

linear in database size. The error was that the author had assumed that the length of

the pattern was a logarithmic function of database size, so that the number of answers

(

n log n

)

,butthat

assumes that the cost of comparing two keys is constant. If the keys are strings, the expected cost

of a pairwise comparison is O

The asymptotic cost of search in a binary tree of size n is usually given as O

(

log n

)

, because, as the search narrows, the number of bits to be

inspected in each string grows; and thus the overall cost might be O

(

log n

)

log 2 n

. On current machines,

however, the string comparison cost is unlikely to be significant—presenting the question of what

exactly should be considered in the analysis. Perhaps cache fetches rather than machine instructions

are the operation of interest.

(

)

Writing for Computer Science

Search WWH ::

Custom Search

Home