Experimentation - Writing for Computer Science

Information Technology Reference

In-Depth Information

be fair rather than constructed to support the hypothesis. If the design of tests seems

biased towards the intended contribution, readers will not be persuaded by the results.

The topic of this chapter is the design, execution, and description of experiments

in computing. As elsewhere in this topic, to some extent the material here draws on

my experience as a researcher. These examples are for the most part work that led to

successful outcomes—which is not to imply that all of my research has succeeded

to this extent.

Baselines

A first step in the design of experiments is to identify the benchmarks against which

your contribution will be measured. That is, it is essential to identify an appropriate

baseline . For example, no sensible researcher would advocate that their new sorting

algorithm was a breakthrough on the basis that it is faster than bubblesort; instead,

the algorithm should be compared to the best previous method.

A benchmark is only compelling if it is implemented to a high standard, and thus

it may be that comparison to a baseline is difficult because an implementation for a

competing method must be obtained. However, without such a comparison it may be

impossible for the reader to know whether the new method offers an improvement.

This is a barrier to entry : before you can begin to produce competitive work in an

area, it is necessary to not only become familiar with the methods and ideas described

in a body of literature but also to have access to a collection of appropriate tools and

resources. But the fact that there is a barrier to entry does not excuse poor science.

A danger in an ongoing research program is to fail to update the choice of baseline.

In the context of text indexing, for example, in work in the 1980s on signature files

performance was compared to that of inverted files as reported in papers from the

1970s. (One of these 1970s papers gave a figure for inverted file size of 50%-300%

of the indexed data, though skeptical considerations strongly suggest that the larger

figure is implausible.) Papers on signature files even in the 2000s continued to quote

these baselines, despite dramatic improvements in inverted files (and well-known

experiments reporting sizes such as 7%-10%). New work in signature files was

compared to previous work in the same area, but not to relevant work on other

pertinent technology.

A similar problem can arise when a well-known, widely available implementa-

tion becomes commonly used as a reference point. When the worth of every new

contribution is shown by comparison to the same baseline system (or, in some cases,

baseline data set), in some respects the field benefits, because the use of the com-

mon resource means that readers can have confidence that the baseline is accurate.

However, in other respects the field may suffer, because the advances that are being

described may not be cumulative.

Some new algorithms solve a novel problem, or solve an existing problem in a

novel way that is for some reason not comparable to previous work. There may still

be a clear baseline to compare to, however. For example, there may be an obvious

Search WWH ::

Custom Search

Home