Information Technology Reference
In-Depth Information
Fig. 1.1 Overview of homology detection methods. a Sequence-based, alignment-dependent
methods. b Structure-assisted, alignment-dependent methods. c Alignment-free methods
feature vectors as homologs. Early methods such as [ 13 , 14 ] use a straightforward
method to compare feature vectors, so they are not very sensitive. The application
of machine learning greatly improves alignment-free homology detection. Machine
learning methods formulate homology detection and fold recognition as a classi-
fication problem. One typical way is to build a binary classifier for each specific
protein class (e.g., superfamily and fold) to identify protein sequences belonging to
this class. It was shown that given suf
cient training data discriminative or
supervised machine learning is in general superior to generative or unsupervised
learning. In particular, a few discriminative learning methods have been developed
using Support Vector Machines (SVM) [ 15
17 ]. The SVM methods primarily differ
in their kernel functions used to measure the distance (or similarity) between two
proteins. Some example SVM methods include SVM-Fisher [ 17 ], SVM-Pairwise
[ 18 ], SVM with the spectrum kernel [ 19 ] and SVM with the mismatch kernel [ 20 ].
See [ 21 ] for a review of these methods. These SVM methods are reported to
outperform the simple feature comparison methods [ 18 , 19 ].
Alignment-free methods face two major challenges. One is how to represent a
protein as a feature vector that contains enough information for homology detec-
tion. Many popular machine learning methods such as SVM require that a feature
vector shall have a
-
fixed dimension regardless of the length of a protein. This
implies that position-speci
c protein features have to be compressed or trans-
formed, which might lead to large information loss. The other issue is that a feature
vector actually encodes information in a complete protein sequence, so it is chal-
lenging for alignment-free methods to recognize homologous domains in multi-
domain proteins in the case that they are homologous only by one of their domains.
Search WWH ::




Custom Search