Biomedical Engineering Reference
In-Depth Information
provide a more meaningful score than GoFigure that takes into account estimated accuracy
based on some characteristics of the search results. Since the estimated accuracy is made
separately for each GO term, the approach also accounted for the differences in the back-
ground frequency of each term. GOtcha is available as a web service at http://www.compbio
.dundee.ac.uk/gotcha/gotcha.php.
GOAnno [55] takes a different approach in the use of sequence homology for function
inference. Given a query proteins sequence, PipeAlign [56] is used to search for its homo-
logues and construct a multiple alignment of complete sequences (MACS) that consists of
clusters of homologues, each representing a potential functional subgroup. GO terms are
then assigned based on three sets of annotations. The first set is the initial protein gene
ontology (IPO), which is the set of already known annotations for query gene. The sec-
ond set, the proximal protein gene ontology (PPO) is the set of GO terms annotated to
proteins that share at least 98% sequence identity with the query protein. The last set, the
mean subfamily gene ontology (MSO) is the set of GO terms annotated to sequences in the
subgroups detected by PipeAlign that fulfill the NorMD [57] multiple sequence alignment
score of NorMD > 0.3. Each term is scored by the number of homologous proteins that
are annotated with the term or its descendant terms. Some thresholds are also imposed to
remove GO branches that are associated with too few proteins. The three sets of annotations
are combined to get the final predicted GO terms. GOAnno is available as a web service at
http://bips.u-strasbg.fr/GOAnno/GOAnno.html.
GOPET [3] takes a machine-learning approach towards function prediction from sequence
homology. A large number of sequences are searched against a database of GO-annotated
sequences. For each query, the GO terms annotated to each homologue found are used as
training examples; a term is deemed a positive example if it is annotated to the query protein
and negative otherwise. Each term is assigned a number of features, such as the E -value,
alignment bit scores and sequence identity of the alignment, as well as the background
frequency of the term, the evidence codes used for the annotation of these terms, and
so on. The training examples are then split randomly into smaller sets that are used to
build multiple classifiers using support vector machines (SVMs). To predict functions for
a given query protein sequence, homologous proteins are obtained using BLAST and each
GO term annotated to these proteins is then scored by building similar features for it and
using the classifiers to classify it as positive or negative. The votes from the classifiers are
summed to obtain the final score. The authors of GOPET compared the method against
GOtcha and found that they performed comparably. GOPET is available as a web service
at http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar.
9.2.3.3 Remote homology
PFP [6] (http://dragon.bio.purdue.edu/pfp/) Position-Specific Iterative Basic Local Align-
ment Tool (PSI-BLAST) improves upon existing sequence-based approaches by extending
a sequence homology search beyond sequences with highly similar sequences. Instead of
using BLAST, PSI-BLAST [58] is used. PSI-BLAST performs an initial BLAST search
using the query sequence and performs multiple sequence alignment on close homologues
discovered, using the query sequence as a template. This alignment is then used to create
a profile taking into account amino acid variation in specific positions of the profile. The
profile, which reflects a model of the homologues found in the BLAST search, is then used
to search against sequences in the database with a slightly modified BLAST algorithm.
Search WWH ::




Custom Search