Biomedical Engineering Reference
In-Depth Information
controls. For example, correlating gene expression of microarray runs in a timely manner is
impractical without computer-based statistical analysis and visualization tools. One reason is that
noise and variability are dynamic; most complex systems get noisier and accumulate variability with
time—hence the need for timely recalibration.
Approximation
The microarray experiment also illustrates that statistical summaries, probability-based predictions,
and estimates of variability introduced by various processes are at best approximations. For example,
Punnet's square allows a researcher to predict, with some degree of certainty, the outcome of mating
pea plants with specific characteristics. The degree to which the predictions hold is based on sample
size and the extent to which the explicit and implicit assumptions of the model are upheld. That is,
sample size, external variables that may affect pea plant phenotype, the method of recording and
analyzing data, and the basic design of the model all affect the accuracy of results.
Interface Noise
Much of bioinformatics work involves interfacing mechanical, biological, and electronic systems, each
of which has its own non-linearities, variability, and noise sources. Furthermore, each interface
introduces noise and variability in the overall process. For example, translating analog fluorescence
intensity to a digital signal introduces noise, decreases overall system dynamic range, and adds non-
linearities and variability to the gene expression data. Similarly, the mechanical and optical-to-digital
interfaces in a nucleotide sequencing machine contribute noise, errors, and random variability to
sequence data.
Assumptions
Most statistical methods assume basic premises that hold regardless of the specific application in
bioinformatics. For example, one of the most popular statistical pattern classification methods is
Bayes' Theorem, developed by the clergyman Thomas Bayes in the 18th Century. His theorem,
applied to such problems as determining the probability that disease is present given that a gene is
shown to be expressed in a microarray experiment, combines the prior probabilities of outcomes
together with the conditional probabilities of various input features in order to reach a conclusion.
Using the odds-likelihood form of Bayes' Theorem, the probability that a patient has a particular
disease can be calculated from three parameters: the pretest probability of the patient having the
disease, the probability that the test is positive in diseased people, and the probability that the test is
positive in non-diseased people.
For example, given that probability ( p ) and odds are related as follows:
In addition, the relationship between pretest and post-test odds is:
Search WWH ::




Custom Search