Biology Reference
In-Depth Information
gains the ability to form internal representations for encoding features
of the input, thus creating new classes in an autonomous fashion. The
ability to create internal representations or an internal mapping from
input data under unsupervised learning is also known as “self-organ-
ized” learning, which is the principal learning paradigm for the training
and operation of another form of ANN known as a self-organizing map.
The learning tasks for which an ANN is typically trained include:
1) pattern recognition, whereby the input or received pattern/signal
is assigned to one of a predefined number of classes; 2) function
approximation, which implies designing an ANN that approximates
an unknown function [ y
f ( x )] describing the input-output mapping;
and 3) classification, which typically consists of assigning input vectors
to one of two or three classes (e.g. yes or no, numerically mapped to
0 and 1, respectively; or high, medium, and low, numerically mapped
to 1, 0, and
=
1, respectively).
ANN requirements. Neural Networks and Genome Informatics ,
by Cathy H. Wu and Jerry W. McLarty, 115 provides detailed descrip-
tions of the types of learning tasks and biomedical applications of
ANNs and the transformation schemes of genomic and proteomic
data for vector representation. Their topic also addresses some of the
requirements and factors that affect the application of ANNs to
molecular sequence analysis, such as feature representation, data
encoding, analysis of data containing sequences of varying length, and
analysis of data sets with limited or missing data. Encoding schemes
for protein feature representation, a process usually referred to as data
pre-processing , and data post-processing, are described below.
Data pre-processing. Data pre-processing consists of data encod-
ing and feature representation. Neural networks, like many other
machine-learning algorithms, require numerical values for processing.
Data encoding addresses the problem of protein representation for
mathematical and computational processing. A protein sequence is
composed of a series of amino acids represented by the characters
A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y.
A sequence of alphabetic characters cannot, however, be used in a math-
ematical computation because none of the elements has a numeric
value. Thus, representation of a protein sequence in a numerical and
Search WWH ::




Custom Search