Biomedical Engineering Reference
In-Depth Information
The homologues found can again be used to modify the profile to get a more representative
profile. This process of profile building and homology search is iterated so that the profile
becomes more general and remote homologues with lower sequence similarity to the query
sequence can be found. Each GO term annotated to the homologues found by PSI-BLAST
is assigned a score in a similar manner to GoFigure and GOtcha, but taking into account
prior knowledge of the association between GO terms:
P f a
f j
b ]
j F i
s(f a )
=
[
log (E(i))
+
i R
where R is the set of sequences found by PSI-BLAST that is above a threshold, F i is the set
of GO terms annotated to sequence i , E ( i )isthe E -value of the alignment result associated
with sequence i , P ( f a / f j ) is the conditional probability that a protein is annotated with term
f a given that it is annotated with term f j .
The conditional probability P ( f a / f j ) is computed based on the annotations of a large
number of annotated proteins, and the collection of all conditional probabilities between
each pair of GO term is collectively termed the function association matrix (FAM). The
incorporation of the FAM into the scoring function allows GO terms that are not anno-
tated to homologues found in the PSI-BLAST to be assigned to the query sequence.
The use of PSI-BLAST and FAM allow PFP to yield significantly better recall then the
other approaches described above while achieving better accuracy than using a standard
PSI-BLAST search.
9.2.4 Phylogenetic relationships
Besides using sequence directly, some approaches also explore phylogenetic relationships
for functional inference. Genes that participate in similar biological functions tend to be
conserved together during speciation. This is intuitive, as proteins do not work alone, but
rather form complexes, or interact with each other in biological pathways to perform their
functions. This observation forms the underlying principle for using phylogenetic relation-
ships to identify proteins with similar functions, which in turn can be use for function
prediction.
9.2.4.1 Phylogenetic profiles
The simplest and probably earliest way to use phylogeny for predicting gene function is
proposed by Pellegrini et al . [4]. For each gene, an n -bit binary vector known as a phyloge-
netic profile is constructed. Each index of the vector represents a currently living organism.
A value of one is assigned to an index if the corresponding organism has a homologue of
the gene; a value of zero is assigned otherwise. The distance between two genes is captured
simply by the number of organisms they differ in, or the Hamming distance, and genes with
profiles that differ by less than 3 bits are defined as neighbors. Using profiles representing
16 organisms (a 16-bit vector), Pellegrini et al . showed that genes with similar profiles tend
to be involved in similar biological functions, although some functionally diverse genes
still share similar profiles due to the limited resolution of 16-bit vectors. However, since
the number of possible profiles for an n -bit profile is 2 n , each additional organism added
Search WWH ::




Custom Search