Biology Reference
In-Depth Information
SVMs are widely used in microarray analysis, exploring issues such as the nature
of host-microbe interactions ( Cummings and Relman, 2000 ), the prediction of mito-
chondrial proteins ( Kumar et al. , 2006 ), prokaryotic gene finding ( Krause et al. ,
2007 ), protein functional classification ( Cai et al. , 2003 ), protein subcellular locali-
sation ( Bhasin et al. , 2005; Gardy et al. , 2005; Gardy and Brinkman, 2006 ) and even
the tracking of the source of microbes in heavily polluted water ( Belanche-Mu˜oz
and Blanch, 2008 ).
Software Availability
SVM light : http://svmlight.joachims.org/.Free for scientific use; source code and binaries
available.
Gismo (Gene Identification Using a Support Vector Machine for ORF Classification):
http://www.cebitec.uni-bielefeld.de/groups/brf/software/gismo/ . Source code in Perl; requires
local installation of Perl plus a number of Perl modules.
SVMProt: http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi . Protein functional family prediction.
5.2 Hidden Markov models
Hidden Markov models (HMMs) were first introduced in the 1960s ( Baum and
Petrie, 1966 ), and have been applied to the analysis of time-dependent data in fields
as such as cryptanalysis, speech recognition and speech synthesis. Their applicability
to problems in bioinformatics became apparent in the late 1990s ( Krogh, 1998 ).
HMMs are frequently used for the statistical analysis of multiple DNA sequence
alignments. They can be used to identify genomic features such as ORFs, insertions,
deletions, substitutions and protein domains, amongst many others. HMMs can also
be used to identify homologies; the widely used Pfam database ( Punta et al. , 2012 ),
for example, is a database of protein families identified using HMMs. HMMs can be
significantly more accurate than the workhorse of sequence comparison tools,
BLAST (Basic Local Alignment Search Tool), first produced in 1990 ( Altschul
et al. , 1990, 1997 ).
An HMM is a statistical model of a sequence. It consists of a library of symbols
making up the sequence, and a set of states that an element of the sequence might
occupy. Each state has a set of weighted transition probabilities : the probability of
moving to a different state. A transition probability depends solely upon the previous
state; states prior to the previous state have no effect on transition probabilities. An
HMM also has a set of emission probabilities : the probability of producing a particular
element of the sequence ( Figure 2.7 ). A model is trained using known sequences to
optimise the weights, and can then be applied to unknown sequences in order to make
predictions. Since several paths through an HMM may produce the same sequence,
paths are ranked by likelihood, by multiplying all of the probabilities together and tak-
ing the logarithm of the result. An algorithm known as the Viterbi algorithm ( Forney,
1973 ) provides an optimal state sequence for many purposes.
Search WWH ::




Custom Search