Experimentation - Writing for Computer Science

Information Technology Reference

In-Depth Information

the reader to believe that your results can be extrapolated a million-fold. Yet, for

some problems, merely doubling the volume of data can introduce new challenges.

Such issues arise in testing of techniques such as document search methods. They

also arise in challenging ways in the context of algorithms for analysis of DNA,

that is, the strings representing genomes. These algorithms have to contend with

several different forms of scale. One form is the length of the strings, which for

a single organism can vary from a few thousand characters (viruses) or millions

(bacteria) to billions (a vertebrate such a human) or over a hundred billion (some

plants). Another form is the complexity of the genome; some contain a great deal of

internal redundancy or copying. Yet another form is the number of organisms being

simultaneously analyzed. A further, more subtle form of scale is the evolutionary

distance between the individual organisms—here, the dimension is of timescale or

diversity, rather than raw data volume. Each of these forms of scale has significant,

non-linear impact on behaviour of commonly used bioinformatics algorithms.

The question of data volume arises in the formal statistical sense of power: whether

your data is of sufficient quantity, or quality, to allowobservation of the effect that you

are seeking. For example, if you are comparing two parsers according to their ability

to accurately extract phrases from English text, it may be that statistical principles

will tell you that a collection of 100 examples is unlikely to be sufficient for the

anticipated improvement to be detected.

Interpretation

When checking experimental design or outcomes, consider whether there are other

possible interpretations of the results; and, if so, design further tests to eliminate these

possibilities. Consider for instance the problem of finding whether a file stored on

disk contains a given string. One algorithm directly scans the file; another algorithm,

which has been found to give faster response, scans a compressed form of the file.

Further tests would be needed to identify whether the speed gain was because the

second algorithm used fewer machine cycles or because the compressed file was

fetched more quickly from disk.

Care is particularly needed when checking the outcome of negative or failed

experiments. A reader of the statement “we have shown that it is not possible to

make further improvement” may wonder whether what has actually been shown is

that the author is not competent to make further improvement. Moreover, the failure

of an experiment typically leads to it being redesigned—such failure is as likely to

expose problems in the tests as in the hypothesis itself. Design of experiments to

demonstrate the failure of the hypothesis is particularly challenging.

It is always worth considering whether the results obtained are sensible. For exam-

ple, are there rules of conservation that should apply to the experiment? This issue

was illustrated by one of my students, who was evaluating a classification method

in which documents were automatically allocated to one of several predetermined

categories. She reported numbers of true and false positives and negatives, as a

Search WWH ::

Custom Search

Home