Hypotheses, Questions, and Evidence - Writing for Computer Science

Information Technology Reference

In-Depth Information

for a researcher to feel certain that a theorem is correct but have doubts about the

mechanics of the proof. 1

Some hypotheses are not amenable to formal analysis, particularly hypotheses that

involve the real world in some way. For example, human behaviour is intrinsic to

questions about interface design, and system properties can be intractably complex.

Consider an exploration to determine whether a newmethod is better than a previous

one at lossless compression of images—is it likely that material that is as diverse as

images can be modelled well enough to predict the performance of a compression

algorithm? It is also a mistake to suppose that an asymptotic analysis is always

sufficient. Nonetheless, the possibility of formal proof should never be overlooked.

Model . A model is a mathematical description of the hypothesis (or some compo-

nent of the hypothesis, such as an algorithm whose properties are being considered)

and there will usually be a demonstration that the hypothesis and model do indeed

correspond.

In choosing to use a model, consider how realistic it will be, or conversely how

many simplifying assumptions need to be made for analysis to be feasible. Take the

example of modelling the cost of a Boolean query on a text collection, in which

the task is to find the documents that contain each of a set of words. We need to

estimate the frequency of each word (because words that are frequent in queries may

be rare in documents); the likelihood of query terms occurring in the same document

(in practice, query terms are thematically related, and do not model well as random

co-occurrences); the fact that longer documents contain more words, but are more

expensive to fetch; and, in a practical system, the probability that the same query had

been issued recently and the answers are cached in memory. It is possible to define

a model based on these factors, but, with so many estimates to make and parameters

to tune, it is unlikely that the model would be realistic.

Simulation . A simulation is usually an implementation or partial implementation

of a simplified form of the hypothesis, in which the difficulties of a full implemen-

tation are sidestepped by omission or approximation. At one extreme a simulation

might be little more than an outline; for example, a parallel algorithm could be tested

on a sequential machine by use of an interpreter that counts machine cycles and com-

munication costs between simulated processors; at the other extreme a simulation

could be an implementation of the hypothesis, but tested on artificial data. A simula-

tion is a “white coats” test: artificial, isolated, and conducted in a tightly controlled

environment.

A great advantage of a simulation is that it provides parameters that can be

smoothly adjusted, allowing the researcher to observe behaviour across a wide spec-

trum of inputs or characteristics. For example, if you are comparing algorithms for

removal of errors in genetic data, use of simulated data might allow you to control

the error rate, and observe when the different algorithms begin to fail. Real data may

have unknown numbers of errors, or only a couple of different error rates, so in some

sense can be less informative. However, with a simulation there is always the risk

1 Which can, of course, lead to the discovery that the theorem is wrong after all.

Writing for Computer Science

Search WWH ::

Custom Search

Home