Biomedical Engineering Reference
In-Depth Information
As the number of transistors in a microcomputer and the number of lines of code in operating
systems climb into the millions, there is the challenge of dealing with increasing complexity.
Complexity theory explains how extremely small errors in initial conditions within complex
systems—a single mistake in a million-line piece of code, for example—can grow to influence
behavior on a much larger scale. It's no surprise that PCs occasionally fail or crash because of
"memory leaks" and other non-specific reflections of a system characterized by complexity.
Sometimes, the results are more insidious, such as the math errors caused by a defect in Intel's
original Pentium chip.
Fortunately, technologies have been developed in an attempt to resolve potential problems before
they surface. For example, decision tables—matrices of possible input and output states—can help
identify combinations of input conditions that should be tried when testing a microprocessor. When
the number of possible input conditions rises to the hundreds, decision tables and other state-
validation tools make an otherwise impossible task doable.
Archiving
As illustrated by the gene sequencing machine, the end result of processing the DNA fragments is
volumes of data that must be stored for a variety of uses. For example, the sequence data can be
compared with other investigators' data to look for inconsistencies or validation. The data can be
processed locally in order to visualize the most likely protein structures that would result from
translation of the nucleotide sequences. In addition, the data can be submitted to one of the national
databases to support the work of other microbiologists or to give the researcher academic credit for
the electronic publication. As such, a reason for creating biological databases is to support the
analysis and communication of data, information, and metadata relevant to molecular biologists. In
many respects, the functions of archiving, processing, and communications overlap significantly.
Just as the transfer of data from DNA to RNA to protein relies on an information infrastructure, data
archives rely on an information technology (IT) infrastructure. This IT infrastructure includes network
and database technologies as well as standard vocabularies to store and access information. Even
though sequencing and other molecular biology data is vast and growing daily, there are huge gaps
in our understanding of the relationships of databases to each other and with higher-level disease
databases. One of the motivations for constructing archives and linking them together is so that this
gap can be closed as quickly as possible.
For the molecular biologist involved in developing or using databases, it's important to consider the
processes involved in managing data before focusing on the technology. That is, the process of data
collection, use, and dissemination should drive technology. After all, Mendel's notebooks didn't
dictate his experiments with garden-variety peas, but they empowered him by leveraging his
capacity to recall previous experiments, to plan for future experiments, and publish his findings.
Numerical Processing
Computers are recognized foremost for their computational or numerical-processing capabilities. In
bioinformatics, applications for numerical-processing techniques range from sequence analysis,
microarray data analysis, and site prediction to gene finding, protein structure prediction, and
phylogenetic analysis. These applications in turn rely on methods ranging from pattern matching,
simulation, and data mining to machine learning, statistics, cluster analysis, and decision trees. For
example, consider the pattern-matching challenge associated with multiple string alignment—aligning
multiple polypeptide sequences—as a means of discovering potential homologous relationships
between proteins. Because millions of calculations may be involved in examining three or four
relatively short sequences, the much more formidable task of matching multiple sequences of several
hundred polypeptides in length is usually computationally prohibitive on even the fastest desktop
hardware.
In numerical-processing applications such as pattern matching, speed of computation is valued over
all else. As every computer hardware manufacturer knows, speed sells. A PC or workstation that was
Search WWH ::




Custom Search