Introduction - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

evolution), then the amino acid substitution score or mutation potential between

i and j is de

ned as follows.

p rel ð

;

ðÞ¼

;

log

p i p j

;

Þ;

The higher the Mði;

the more likely the two aligned residues are related.

Summing up Mi

over all aligned positions and deducting gap penalty yields

a quality score of one alignment. Mi

ðÞ

;

ðÞ

;

can also be interpreted in a different way.

By Bayes

rules, we have the following equation.

;

Þ¼

P related

;

p rel ð

;

¼ P related

p i p j

is a constant, Eq. ( 1.2 ) implies

that the log-likelihood of two aligned residues being evolutionarily related actually

equals to their mutation potential plus a constant.

Given a scoring function, dynamic programming algorithm can be used to

generate an alignment between two protein sequences, which guarantees to yield an

alignment optimizing the scoring function. However, dynamic programming has

computational time proportional to the length product of two proteins under

alignment, which may be too slow in many applications of homology search

especially when a large database of subject proteins is searched for homologs.

To speed up, some heuristic methods, such as BLAST, are developed to generate

suboptimal alignments and detect close homologs much more efficiently.

If we assume that the prior probability P related

1.4.2 Pro

le-Based Alignment for Homology Detection

and Fold Recognition

To improve remote homology detection, profile-based protein comparison is

developed. The sequence pro

le of a protein encodes its evolutionary information

and is built from a set of close homologs. That is, instead of aligning primary

sequences, we may align/compare two protein sets, each containing close homologs

of a protein in question. To make comparison or alignment easy, a set of protein

homologs is usually represented as a sequence pro

le. The utilization of sequence

pro

les has increased the sensitivity of homology detection by three times over pure

sequence comparison [ 47 ].

PSI-BLAST can be used to generate sequence pro

le of a protein. PSI-BLAST

finds close homologs of this protein from a large sequence database such as the NCBI

non-redundant (NR) database [ 48 ], build a multiple sequence alignment (MSA) of

these homologs, and then convert the MSA to a sequence pro

le. For example,

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home