Introduction - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

pro

le alignment tools include DIALIGN [ 42 ] and

FPS [ 43 ]. HHpred [ 44 ], FORTE [ 45 ], and PICASSO [ 46 ] are some tools that use

pro

le HMM. Other sequence-pro

le alignment for homology detection. They have shown better perfor-

mance than sequence-sequence or sequence-pro

le-pro

le methods for remote homology

detection.

In the following sections, we will present more details on the state-of-the-art

methods for alignment-based homology detection and fold recognition.

1.4.1 Sequence Alignment for Homology Detection

and Fold Recognition

Sequence alignment is a basic method for homology detection. The underlying

reason is that two homologous proteins shall align well to each other (i.e., there are

many conserved residues in their alignment and few gaps). As such, we can infer

the relationship of two proteins by sequence alignment quality. A simple rule is that

if the sequence identity of two proteins (i.e., the percentage of identical residues in

their alignment) is high (e.g., >40 %), then it is very likely that they are homolo-

gous. The limitation of sequence alignment lies in that it cannot reliably detect

homologous relationship when proteins under study are not very close to each

other. In particular, homology detection by sequence alignment is not very reliable

when the similarity of two proteins falls into the twilight zone, i.e., the sequence

identity of two proteins is less than 25 %. However, in many cases two proteins

sharing low sequence identity may still be homologous and share some important

structural and functional properties.

The quality of one alignment cannot be accurately judged by sequence identity.

Instead, a more sophisticated score is needed to quantify protein similarity based on

the given alignment. A typical scoring function calculates a ratio between the

likelihood of two proteins being homologous (or evolutionarily related) and that of

being non-homologous (or evolutionarily unrelated). We can use two amino acid

substitution models to estimate the probability of two proteins being homologous

and non-homologous, respectively. The probability model for

is

also called null model, describing the case that two aligned residues are evolu-

tionarily unrelated. Let X and Y denote the amino acid types of two aligned resi-

dues. The null model calculates the occurring probability of X and Y being aligned

as X

“

non-homologous

”

where P i and P j are the background probability of amino

acid i and j, respectively. A few probability models such as PAM [ 39 ] and BLO-

SUM [ 39 ] have been developed to estimate how likely two aligned residues are

evolutionarily related. PAM estimates the relatedness of two aligned residues

starting from single point mutations. BLOSUM derives amino acid substitution

model from blocks of multiple sequence alignment.

Let PX

ð

¼

i

;

Y

¼

j

Þ ¼

P i P j ;

denote the probability of two aligned

residues being evolutionarily related (i.e., one substitutes the other during

ð

¼

i

;

Y

¼

j

P rel ð

i

;

j

Þ

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home