Biomedical Engineering Reference
In-Depth Information
when the position is outside the N/C-terminal region (1 if outside and 0 if not)
and 1 unit accounts for the conservation weight at that position (see below for
definition). The output of the first level NN consists of three nodes, one for each
possible secondary structure element helix/strand/coil, corresponding to the state
of the central residue in the window. The first level NN classifies (13-residues
long) protein segments according to the secondary structure class of their cen-
tral residue. This classification does not reflect the fact that different segments
can be correlated, being, for example, consecutive and overlapping in the protein
sequence. Particularly, at this level, the NN has no knowledge of the correlation
between secondary structure elements. For example, it has no way to know that
a helix consists of at least three consecutive elements.
2. The second level is introduced to take into account the correlation between
consecutive secondary structure elements. The input of the second level NN is
compiled from the output of the first level NN. For every residue position, the in-
put unit encodes a window of 17 consecutive elements taken from the secondary
structure prediction of the first NN. Every position in the window is encoded
with 5 units: three for the predicted secondary structure, one to detect whether
the position is outside the boundaries of the protein and one for the conservation
weight. The output is set as in the first NN and, also in this case, corresponds to
the state of the central residue in the window.
3. The consensus is a simple arithmetic average over (typically four) differently
trained networks. The highest value of the three output units is taken as the final
prediction. To every such prediction, a reliability index can be associated with
the following formula
RI
Dd 10 .o 1 o 2 / e ;
(2.4)
where o 1 and o 2 are the highest and the second highest values in the output vec-
tor, respectively. The prediction obtained is finally filtered (with the help of the
reliability index) in order to fix some eventually unrealistic local predictions that
neither the second level NN nor the consensus were able to detect (particularly,
too short alpha-helix segments).
The conservation weight provides a score for positions in the MSA with respect
to their level of conservation: the more conserved is a position the higher is the
conservation weight score. Such a weight is contained in the HSSP database and it
is defined by
P r;s D 1 w rs sim rs
P r;s D 1
CW i D
(2.5)
w rs
with
1
100 ident rs ;
where N is the number of sequences in the multiple alignment, ident rs is the per-
centage of sequence identity (over the entire length) of sequences r; s and sim rs
is the value of the similarity between sequences r; s at position i according to the
Dayhoff similarity matrix [ 8 ].
w rs
D 1
 
Search WWH ::




Custom Search