Information Technology Reference
In-Depth Information
only two-thirds of the time was required. The explanation, it developed, was that the
smaller collection was kept with its index on a single disk drive while, for the larger
collection, the index had been placed on a separate drive due to space constraints. In
the case of the smaller collection, the accesses to data and index had been interfering
with each other, causing disk access delays that were largely absent when two drives
were used. 7 Another aspect of research that this incident illustrates is the need for
inventive experiments. Identifying a range of ways to alter the behaviour of a system,
then measuring their effect, can lead to unexpected revelations.
Compilers are a substantial cause of systematic error. Versions of the same com-
piler can vary dramatically for particular programs, as can system software such as
file managers. After an upgrade from one version of Linux to the next release, the
time to run an external sort routine we were testing rose from 3,500 s to 7,500 s.
However, a code profiler showed that some individual routines were running more
quickly.
In another experiment, we were troubled by a random error. Sometimes a run
completed in around an hour, but often took an hour and a half. Intermediate times
were not observed. The experiment involved a complex interaction between stored
data, a memory buffer, and temporary files, so some variability was reasonable, but we
expected a spread of results—not two widely separated clusters. The explanation was
the screen lock; while earlier experiments had been run on a server, we had recently
moved to a high-performance desktop PC. The slower runs had been overnight, when
the PC was not in use.
Test your intuition on the following example. Suppose you write a program for
searching a large file of randomly selected strings. The first stage of the program
reads the set of strings into memory, creates an array of pointers (one to the start of
each string), then sorts the pointers so that the strings are in lexicographic order. The
program then reads a query string and uses binary search in the array to find whether
the query is present in the original string set.
1. If two searches in a row are for the same string, do both searches take about the
same length of time?
2. Suppose the number of strings in the file is increased by a factor of sixteen. Do
searches then take about four times as long?
3. Suppose linear search is used instead of binary search. If the original file is already
sorted, are searches the same speed as for an unsorted file? (Recall that the pointers
to the strings are sorted after the strings are read in.)
7 Problems of this kind, and their solutions, can be highly illuminating. In this case, we discovered
that disk seek times were a major component of total costs, accounting for around half of all elapsed
time. Had we been explicitly investigating the significance of seek costs, we might not have thought
of this experiment.
In an experiment undertaken by colleagues of mine in the mid 2000s, they found that the time
required for disk-based algorithms could vary by as much as 15 %, depending on whether the data
was stored near the inside or the outside of the disk platter. As they noted in their paper, to do such
experiments “you may need to become intimately aquainted with the behaviour of your disk drive”.
Search WWH ::




Custom Search