The Central Dogma - Bioinformatics Computing

Biomedical Engineering Reference

In-Depth Information

leading-edge two years ago now seems slow, given that the next processor will provide leaps in

processing power according to the prediction of Intel Computer's CEO, Gordon Moore. Moore's Law,

which states that microprocessor capacity doubles every 18 months, has held up thus far, and will

likely do so for the next decade. Processing speed is critical for analysis of biological data, especially

when there are vast amounts of processing-associated visualization, sequence alignment, and

sequence prediction. For example, Craig Venter of Celera Genomics backed up his statement of

"Speed matters, discovery can't wait" with an $80 million supercomputer. Similarly, for the

researcher using Web-based tools to search for sequences, the processing time required for each

search effectively limits the number of searches that can be performed in a working day.

In addition to string matching and manipulation operations using string-manipulation capabilities

inherent in languages such as Perl, JAVA, or PHP, the most significant pattern-matching tools are

rules-based expert systems, artificial neural networks, and genetic algorithms. Rules-based systems,

all of which have been developed on the digital computer, rely on IF-THEN clauses. For example:

IF First Codon = "T" AND Second Codon = "A"

AND (Third Codon = "A" OR Third Codon ="G")

THEN Codon = "Stop"

Rules-based systems are often developed in a language such as LISP and then recoded in JAVA,

C++, XML, or another efficient language.

Another class of pattern-matching programs is machine learning, typified by the artificial neural

network (or neural net). Unlike expert systems, neural nets don't rely on conventional algorithmic

programming techniques, but work by altering the strength of connections between input and output

nodes. The advantage of artificial neural networks over conventional, algorithm-based systems is that

they can learn from examples, and generalize this learning to new situations. For example, the input

nodes can be associated with amino acid sequences and the output nodes can be associated with

specific protein folding patterns. When a novel amino acid sequence is presented to the neural net, it

can make a guess as to the folding pattern of the protein. A rule-based system, like a conventional

algorithm-based system, would simply fail with a novel sequence. In addition to protein structure

prediction, neural networks are used as "gene finders," typified by the applications GRAIL,

GeneParser, and Genie.

Artificial neural networks are commonly created in layers, with one or more hidden layers sandwiched

between the input and output layers. It is the hidden layer that does most of the work involved in

classifying or recognizing the pattern presented to the input layer. Learning is represented by the

relative strengths of the connections between the individual nodes, which are defined during training

of the network. That is, the internal simulation is inherently analog in nature, even though the input

and output states are mapped to binary values. During training, important pathways are

strengthened, and unimportant ones diminish with experience (repeated training).

In the elementary artificial neural network schematic shown in Figure 1-14 , the network consists of

three input and two output nodes, with each input node connected to both output nodes. The

possible truth table shown in the figure is the result of the specific training of the network. Other

truth tables would result from other training.

Figure 1-14. Artificial Neural Network. This machine-learning technology

relies on tightly interconnected input, hidden, and output layers to map

input patterns to output patterns. One of many possible truth tables (right)

illustrates the mapping of input to output patterns. Learning is signified by

the thickness of lines joining nodes, and node values are indicated by color

(white = 0 and black = 1). Hidden nodes can take on values between 0 and

1.

Bioinformatics Computing

Search WWH ::

Custom Search

Home