Biomedical Engineering Reference
In-Depth Information
accuracy of 91 and 73%, respectively, for correctly identifying TMBs and exclud-
ing globular proteins. The application of the method has been illustrated with Fig. 1 .
It takes the amino acid sequence in the input and computes the residue pair
preference and weighted composition. The weighted composition of the query
protein is +89.16, and hence it is identified as an outer membrane protein or TMS
protein.
3.4.2 Machine-Learning Techniques
Currently, machine-learning techniques are widely used for predicting several
important factors in bioinformatics, including the secondary structures of proteins,
solvent accessibility, protein-protein interactions, protein-nucleic acid interactions,
etc. [ 46 - 50 ]. It has been reported that the machine-learning algorithms could
achieve the highest level of accuracy. These methods include Bayes functions,
Neural networks, Logistic functions, Support vector machines, Regression analysis,
Nearest neighbor methods, Meta learning, Decision trees and Rules, etc. The details
of all these methods have been explained in Gromiha and Suwa [ 51 ].
Martelli et al. [ 52 ] used 12 TMBs and developed a sequence-profile-based HMM
for picking up the
-barrel membrane proteins and reported an accuracy of 84% in a
set of 145 TMBs. Bagos et al. [ 53 ] developed an algorithm based on HMM for
discriminating TMBs and reported an accuracy of 89% in a set of 133 TMBs. Natt
et al. [ 54 ] used a set of 16 TMBs and proposed a method using the combination of
neural networks and support vector machines for discrimination, which showed an
average accuracy of 90% in a set of randomly selected 100 globular and 16 TMBs.
Garrow et al. [ 55 ] proposed a modified k-nearest neighbor algorithm and reported
an accuracy of 92.5% using weighted amino acids and evolutionary information.
Gromiha and Suwa [ 51 ] analyzed the performance of different machine-learning
techniques and found that there is no significant difference in performance between
different machine-learning methods and most of the methods could discriminate
TMBs with the accuracy in the range of 88-91% in a set of 1,088 proteins. Further,
the usage of different adjustable parameters in these methods would make it
possible for any method to perform better than the others.
They have further analyzed the applications of residue pair preferences and
amino acid properties as descriptors for discriminating TMBs using machine-
learning techniques [ 51 , 56 ]. It has been observed that the usage of all the 400
residue pairs increased the accuracy up to 94.5% using support vector machines.
Generally many parameters may cause the problem of overfitting, and hence Park
et al. [ 56 ] selected few amino acids and residue pairs for discrimination. The
combination of the compositions of 18 amino acid residues (except Ala and Glu)
and 10 residue pairs (QA, DF, DA, KK, EF, NK, DR, YN, FF, and LI) improved the
accuracy up to 93.9% for discriminating TMBs from other folding types of globular
and membrane proteins.
Further, Gromiha and Suwa [ 51 ] applied 49 different physicochemical, ener-
getic, and conformational parameters of amino acid residues for discriminating
b
Search WWH ::




Custom Search