Information Technology Reference
In-Depth Information
for a researcher to feel certain that a theorem is correct but have doubts about the
mechanics of the proof. 1
Some hypotheses are not amenable to formal analysis, particularly hypotheses that
involve the real world in some way. For example, human behaviour is intrinsic to
questions about interface design, and system properties can be intractably complex.
Consider an exploration to determine whether a newmethod is better than a previous
one at lossless compression of images—is it likely that material that is as diverse as
images can be modelled well enough to predict the performance of a compression
algorithm? It is also a mistake to suppose that an asymptotic analysis is always
sufficient. Nonetheless, the possibility of formal proof should never be overlooked.
Model . A model is a mathematical description of the hypothesis (or some compo-
nent of the hypothesis, such as an algorithm whose properties are being considered)
and there will usually be a demonstration that the hypothesis and model do indeed
correspond.
In choosing to use a model, consider how realistic it will be, or conversely how
many simplifying assumptions need to be made for analysis to be feasible. Take the
example of modelling the cost of a Boolean query on a text collection, in which
the task is to find the documents that contain each of a set of words. We need to
estimate the frequency of each word (because words that are frequent in queries may
be rare in documents); the likelihood of query terms occurring in the same document
(in practice, query terms are thematically related, and do not model well as random
co-occurrences); the fact that longer documents contain more words, but are more
expensive to fetch; and, in a practical system, the probability that the same query had
been issued recently and the answers are cached in memory. It is possible to define
a model based on these factors, but, with so many estimates to make and parameters
to tune, it is unlikely that the model would be realistic.
Simulation . A simulation is usually an implementation or partial implementation
of a simplified form of the hypothesis, in which the difficulties of a full implemen-
tation are sidestepped by omission or approximation. At one extreme a simulation
might be little more than an outline; for example, a parallel algorithm could be tested
on a sequential machine by use of an interpreter that counts machine cycles and com-
munication costs between simulated processors; at the other extreme a simulation
could be an implementation of the hypothesis, but tested on artificial data. A simula-
tion is a “white coats” test: artificial, isolated, and conducted in a tightly controlled
environment.
A great advantage of a simulation is that it provides parameters that can be
smoothly adjusted, allowing the researcher to observe behaviour across a wide spec-
trum of inputs or characteristics. For example, if you are comparing algorithms for
removal of errors in genetic data, use of simulated data might allow you to control
the error rate, and observe when the different algorithms begin to fail. Real data may
have unknown numbers of errors, or only a couple of different error rates, so in some
sense can be less informative. However, with a simulation there is always the risk
1 Which can, of course, lead to the discovery that the theorem is wrong after all.
Search WWH ::




Custom Search