Information Technology Reference
In-Depth Information
be fair rather than constructed to support the hypothesis. If the design of tests seems
biased towards the intended contribution, readers will not be persuaded by the results.
The topic of this chapter is the design, execution, and description of experiments
in computing. As elsewhere in this topic, to some extent the material here draws on
my experience as a researcher. These examples are for the most part work that led to
successful outcomes—which is not to imply that all of my research has succeeded
to this extent.
Baselines
A first step in the design of experiments is to identify the benchmarks against which
your contribution will be measured. That is, it is essential to identify an appropriate
baseline . For example, no sensible researcher would advocate that their new sorting
algorithm was a breakthrough on the basis that it is faster than bubblesort; instead,
the algorithm should be compared to the best previous method.
A benchmark is only compelling if it is implemented to a high standard, and thus
it may be that comparison to a baseline is difficult because an implementation for a
competing method must be obtained. However, without such a comparison it may be
impossible for the reader to know whether the new method offers an improvement.
This is a barrier to entry : before you can begin to produce competitive work in an
area, it is necessary to not only become familiar with the methods and ideas described
in a body of literature but also to have access to a collection of appropriate tools and
resources. But the fact that there is a barrier to entry does not excuse poor science.
A danger in an ongoing research program is to fail to update the choice of baseline.
In the context of text indexing, for example, in work in the 1980s on signature files
performance was compared to that of inverted files as reported in papers from the
1970s. (One of these 1970s papers gave a figure for inverted file size of 50%-300%
of the indexed data, though skeptical considerations strongly suggest that the larger
figure is implausible.) Papers on signature files even in the 2000s continued to quote
these baselines, despite dramatic improvements in inverted files (and well-known
experiments reporting sizes such as 7%-10%). New work in signature files was
compared to previous work in the same area, but not to relevant work on other
pertinent technology.
A similar problem can arise when a well-known, widely available implementa-
tion becomes commonly used as a reference point. When the worth of every new
contribution is shown by comparison to the same baseline system (or, in some cases,
baseline data set), in some respects the field benefits, because the use of the com-
mon resource means that readers can have confidence that the baseline is accurate.
However, in other respects the field may suffer, because the advances that are being
described may not be cumulative.
Some new algorithms solve a novel problem, or solve an existing problem in a
novel way that is for some reason not comparable to previous work. There may still
be a clear baseline to compare to, however. For example, there may be an obvious
 
Search WWH ::




Custom Search