Information Technology Reference
In-Depth Information
Success or otherwise should as far as possible be obvious, not subject to interpre-
tation. If one team develops a piece of software more quickly than does another, is
that because (as a researcher might have hypothesized) the first team used a new soft-
ware development methodology? Or is it because the first team is larger; or smaller;
or happier; or more competitive; or more experienced in the problem, the language,
or coding in general; or lucky with regard to bugs and design decisions; or not doing
the work while a major sporting event is on; or some other thing?
Another example of this principle is provided by the various improvements that
can be made to the standard quicksort algorithm, such as better choice of pivot values
and use of loops that avoid expensive procedure calls.With test data chosen to exercise
the various cases—such as initially unsorted, initially sorted, or many repetitions of
some values—experiments can show that the improvements do indeed lead to faster
sorting. What such experiments cannot show is that quicksort is inherently better
than, say, mergesort. While it might, for example, be possible to deduce that the
same kinds of improvement do not yield benefits for mergesort, nothing can be
deduced about the relative merits of the algorithms because the relative quality of the
implementations is unknown, and because the data has not been selected to examine
trends such as asymptotic performance.
For speed experiments based on a series of runs, the published results will be
either minimum, average, median, or maximum times. Maximum times can include
anomalies, such as a run during which a greedy process (a tape dump, for example)
shuts out other processes. Minimums can be underestimates, for example when the
time slice allocated to a process does not include any clock ticks. But nor are aver-
ages always appropriate—outlying points may be the result of system dependencies.
Statistical considerations are discussed later.
Resultsmay include some anomalies or peculiarities. These should be explained or
at least discussed. Don't discard anomalies unless you are certain they are irrelevant;
they may represent problems you haven't considered.
As the graph shows, the algorithm was much slower on two of the data sets.
We are still investigating this behaviour.
It is likewise valuable to explore behaviour at limits and to explain trends.
A common failing in experimental work is that complex processes are tested as a
whole, but not as components. Many proposed methods are pipelines or composites
of one kind or another, in which independent elements are combined to give a result.
For example, a search engine might consist of a crawler, for fetching pages; a parser,
for extracting content; and a query engine, for assessing similarity. The design of
each element has impact on the final results. If a researcher proposed a new engine
comprised of entirely new components, but only tested it as a whole, the reader
would not learn to what extent each component was valuable; it might be that all of
the benefit came from just one of them. If a series of decisions have led to the final
form of the contribution, or the contribution is composed of separate elements or
stages, each should be assessed independently.
Similarly, an experimental regime should include separate investigation of each
relevant variable—the reader needs to know what factors are influencing the
Search WWH ::




Custom Search