Biomedical Engineering Reference
In-Depth Information
leading-edge two years ago now seems slow, given that the next processor will provide leaps in
processing power according to the prediction of Intel Computer's CEO, Gordon Moore. Moore's Law,
which states that microprocessor capacity doubles every 18 months, has held up thus far, and will
likely do so for the next decade. Processing speed is critical for analysis of biological data, especially
when there are vast amounts of processing-associated visualization, sequence alignment, and
sequence prediction. For example, Craig Venter of Celera Genomics backed up his statement of
"Speed matters, discovery can't wait" with an $80 million supercomputer. Similarly, for the
researcher using Web-based tools to search for sequences, the processing time required for each
search effectively limits the number of searches that can be performed in a working day.
In addition to string matching and manipulation operations using string-manipulation capabilities
inherent in languages such as Perl, JAVA, or PHP, the most significant pattern-matching tools are
rules-based expert systems, artificial neural networks, and genetic algorithms. Rules-based systems,
all of which have been developed on the digital computer, rely on IF-THEN clauses. For example:
IF First Codon = "T" AND Second Codon = "A"
AND (Third Codon = "A" OR Third Codon ="G")
THEN Codon = "Stop"
Rules-based systems are often developed in a language such as LISP and then recoded in JAVA,
C++, XML, or another efficient language.
Another class of pattern-matching programs is machine learning, typified by the artificial neural
network (or neural net). Unlike expert systems, neural nets don't rely on conventional algorithmic
programming techniques, but work by altering the strength of connections between input and output
nodes. The advantage of artificial neural networks over conventional, algorithm-based systems is that
they can learn from examples, and generalize this learning to new situations. For example, the input
nodes can be associated with amino acid sequences and the output nodes can be associated with
specific protein folding patterns. When a novel amino acid sequence is presented to the neural net, it
can make a guess as to the folding pattern of the protein. A rule-based system, like a conventional
algorithm-based system, would simply fail with a novel sequence. In addition to protein structure
prediction, neural networks are used as "gene finders," typified by the applications GRAIL,
GeneParser, and Genie.
Artificial neural networks are commonly created in layers, with one or more hidden layers sandwiched
between the input and output layers. It is the hidden layer that does most of the work involved in
classifying or recognizing the pattern presented to the input layer. Learning is represented by the
relative strengths of the connections between the individual nodes, which are defined during training
of the network. That is, the internal simulation is inherently analog in nature, even though the input
and output states are mapped to binary values. During training, important pathways are
strengthened, and unimportant ones diminish with experience (repeated training).
In the elementary artificial neural network schematic shown in Figure 1-14 , the network consists of
three input and two output nodes, with each input node connected to both output nodes. The
possible truth table shown in the figure is the result of the specific training of the network. Other
truth tables would result from other training.
Figure 1-14. Artificial Neural Network. This machine-learning technology
relies on tightly interconnected input, hidden, and output layers to map
input patterns to output patterns. One of many possible truth tables (right)
illustrates the mapping of input to output patterns. Learning is signified by
the thickness of lines joining nodes, and node values are indicated by color
(white = 0 and black = 1). Hidden nodes can take on values between 0 and
1.
Search WWH ::




Custom Search