Experimentation - Writing for Computer Science

Information Technology Reference

In-Depth Information

certainly cannot do is identify the best parameters for all data sets, or even identify

whether there are stable best parameters to choose.

It is for this reason that descriptions of the research cycle distinguish between

an observation phase (used to learn about the object under study) and a testing or

confirmation phase (used to validate hypotheses). If parameters have been derived by

tuning, the only way to establish their validity is to see if they give good behaviour

on other data. Choosing parameters to suit data, or choosing data to suit parameters,

in all likelihood invalidates the research.

The research in some fields is underpinned by the availability and use of reference

data sets. Such resources can be dramatically larger and more comprehensive than

the materials that could be created by a typical research team, are easy to explain to

readers, and, in principle, allow the direct comparison of work between institutions

and between papers. In some instances, it can be difficult to publish work unless a

reference data set has been used. However, use of such data also carries risks, in

particular of overfitting; that is, methods can become so specialized that they do not

work on other data.

When considering what experiments to try, identify the data or input for which

the hypothesis is least likely to hold. These are the interesting cases: if they are not

tested—if only the cases where the hypothesis is most likely to hold are tested—then

the experiments won't prove much at all. The experiment should of course be a test

of the hypothesis; you need to verify that what you are testing is what you intended

to test, and an experiment should only succeed if the hypothesis is correct.

An underlying point, then, is that persuasive research requires appropriate data,

and thus you need to be confident that you can obtain good data before committing

to a particular research question. (In some fields, it may be that the research goal

is to obtain data: telescopes and particle accelerators are built to collect data, for

example. But, in computing, such research is extremely rare.) It follows that pursuit

of some questions, no matter how interesting they may be, will not feasible for some

researchers.

Ask whether a single data set is sufficient, or whether multiple data sets are

required: for separate training and testing, or for independent confirmation. A related

question is whether multiple data sets are indeed sufficiently independent; subsam-

ples of a single large data set may, for practical purposes, be the same, and not yield

the truly independent confirmation that is being sought.

Sometimes appropriate data can be artificial, or simulated; as noted in Chap. 4 ,

such data can allow a thorough exploration of the properties of an algorithm. But

such data should not be used without a clear understanding of its limitations. For

example, application of a new hash function to random data is unlikely to be a

convincing demonstration that the function is uniform, since the data was uniform

to begin with. Fundamentally, any scheme for generating artificial data relies on a

model, which embodies assumptions and, probably, simplifications. The strongest

defence of artificial data is to validate it against real data.

A related question is estimation of the volume of data required. Another way of

phrasing this same issue is: to what volumes of data should your claims apply? If you

are making claims about terabytes (say), but testing on megabytes, you are asking

Writing for Computer Science

Search WWH ::

Custom Search

Home