Biomedical Engineering Reference
In-Depth Information
3.5
Experimental Results
In this implementation, the amino acid sequences of any protein is subdivided into
segments of 13 amino acids. Each amino acid is coded as a five-bit string, and num-
bered from 1 to 20, so that each pattern vector is composed of 65 binary elements
and the 66th element is assigned the class label of the amino acid corresponding
to the class of the median one of the segment, element number 7. The training set
used is indicated by the protein data bank ( PDB )[ 3 ], which defines a pattern vector
corresponding to that chosen segment of the amino acid sequence of the protein.
No multiple alignment information is included. In the subsequent segment, 13 con-
secutive amino acids are considered, starting from the second one of the preceding
segment and adding as a 13th segment the amino acid subsequent to the final one
of the immediately previous segment defined. Two consecutive patters differ in the
first and last element. Formally, a window of 13 amino acids is considered, and each
pattern is formed by shifting the window of one position. Particular techniques are
applied to initialise and terminate the patterns of a protein, and the class assigned
to each pattern is always the folding class belonging to the seventh element in the
pattern [ 5 ].
Consider all the sequences of the proteins which are included in a training set and
compare them pairwise to determine the number of alignment amino acids common
to the two proteins. An appropriate procedure is used to obtain the largest number
of aligned amino acids by sliding the two sequences up and down and also inserting
pieces of the string, according to strict rules [ 3 ]. For the similarity classification of
the proteins in the training set, the largest alignment value is determined from the
percentage of amino acids aligned between all proteins in the training set, and eight
convenient classes of similitude are defined by setting suitable intervals of alignment
percentage values. In Table 3.1 , the similarity classes are shown together with the
percentage interval of alignment scores or similitude which indicates interval of the
largest percentage value of alignment of the protein in the training set.
For the purpose of this analysis, without loss of generality, proteins belonging
to an isoform class are defined as proteins belonging to similarity class 7. This is
taken as a necessary condition but is not a sufficient condition, since isoforms may
have very different similarity, in which case the markers can be easily identified by
traditional methods. Here, it is important to determine isoforms of proteins, which
Tabl e 3. 1 Similarity classes
and percentage similarity
among proteins
Similarity class
Similitude
0
<0.30
1
0.30-0.40
2
0.40-0.50
3
0.50-0.60
4
0.60-0.70
5
0.70-0.80
6
0.80-0.90
7
>0.90
 
Search WWH ::




Custom Search