Information Technology Reference
In-Depth Information
For algorithms that operate on static data structures, it may be appropriate to
consider the cost of creating that data structure. For example, binary search in a
sorted array takes O
(
log n
)
time, but O
(
n log n
)
time could be required to initially
sort the array.
Make sure that the domain of the analysis is clear, and be careful to analyze the
right component of the data. It would usually be appropriate, for example, to analyze
database algorithms as a function of the number of records, not of the length of
individual records. However, if record length can substantially vary then possibly it
too should be considered. For algorithms that apply arithmetic to integers it may be
appropriate to regard each arithmetic operation as having unit cost. On the other hand,
if the integers involved can be of arbitrary length (consider for example public-key
encryption algorithms that rely for privacy on the expense of prime factorization)
it is appropriate to regard the cost of the arithmetic operations as a function of the
number of bits in each integer. 2
Two subtle problems are that the dominant cost may change with scale, and that
the cost that is dominant in theory may never dominate in practice. For example, a
certain algorithm might require O
(
n log n
)
comparisons and O
(
n
)
disk accesses. In
principle the asymptotic cost of the algorithm is O
, but, given that a disk
access may require 5 ms and a comparison less than a nanosecond, in practice the
cost of the disk accesses might well dominate for any conceivable application.
Some authors misunderstand the logic of asymptotic claims. For example,
Amdahl's law states that the lower bound for the time taken for an algorithm to
complete is determined by the part of the algorithm that is inherently sequential. The
remainder can be executed in parallel and hence time for this part can be reduced
by addition of processors, but no increase in the number of processors can affect the
lower bound. However, it has been claimed that Amdahl's law was broken, in the
context of a certain algorithm, by increasing both the size of the input data and the
number of processors. These changes had minimal impact on the sequential part of
the algorithm, so that the proportion of total processing time spent in the sequential
part was reduced; but this result does not contradict Amdahl's law, and so the claim
was false.
Another fallacious claim was that, for a certain indexing technique, the time
required to find matches to a pattern in a database was asymptotically sublinear in
the database size—a remarkable result, because the probability that a record is a
match to a given pattern is fixed, so that in the limit the number of matches must be
linear in database size. The error was that the author had assumed that the length of
the pattern was a logarithmic function of database size, so that the number of answers
(
n log n
)
2
,butthat
assumes that the cost of comparing two keys is constant. If the keys are strings, the expected cost
of a pairwise comparison is O
The asymptotic cost of search in a binary tree of size n is usually given as O
(
log n
)
, because, as the search narrows, the number of bits to be
inspected in each string grows; and thus the overall cost might be O
(
log n
)
log 2 n
. On current machines,
however, the string comparison cost is unlikely to be significant—presenting the question of what
exactly should be considered in the analysis. Perhaps cache fetches rather than machine instructions
are the operation of interest.
(
)
Search WWH ::




Custom Search