Biology Reference
In-Depth Information
could fi nd the hidden properties of molecules by doing some
sort of biochemical assay. 29
Most bioinformatic problems have this character—there are too many
data points, too many individual cases, to check one by one; only the
computer has the time and the power to process them. But how do you
decide when something is “real”? How do you decide whether the com-
puter is producing a biological result or an artifact? How do you trust
the machine?
When I posed this question to bioinformaticians, or when I observed
their work in the lab, it seemed that they were using the computer itself
to reassure themselves that it was producing “biology” rather than gar-
bage. This process was often described as a “sanity check”—tests were
specially designed to check that the computer was not taking the data
and doing something that made no sense. Bioinformaticians consider
these tests to be the computational equivalent of experimental con-
trols. In a traditional wet lab, a control is usually a well-characterized
standard—if you are measuring the length of a sequence on a gel, the
control might be a sequence of known length. If the experiment does
not reproduce the length of the control accurately, then it is unlikely
that it is measuring the lengths of unknown samples accurately either. A
computational sanity check could be described in the same terms: run
the computer program on a well-characterized data set for which the
result is known to check that it is giving sensible results. For example,
sometimes it is possible to use the computer to construct an alternative
data set (often using random elements) that, when run through a given
piece of software, should produce a predictable outcome. Just as with
my project, a large part of the work of doing bioinformatics revolves
around the construction of appropriate controls or sanity checks. This is
not a trivial task—the choice of how to construct such tests determines
the faith one can put in the fi nal results. In other words, the quality and
thoroughness of the sanity checks are what ultimately give the work its
plausibility (or lack thereof).
Such sanity checks are actually simulations. They are (stochastic)
models of biological systems, implemented on a computer. Bioinforma-
ticians are aware that there is much about both their data and their
tools that they cannot know. As for the data, there are simply too many
to be able to examine them all by eye. As for their tools, they necessarily
remain in ignorance about the details of a piece of software, either be-
cause it was written by someone else or because it is suffi ciently compli-
cated that it may not behave as designed or expected. In both cases, it is
Search WWH ::




Custom Search