Software - Protein Homology Detection Through Alignment of Markov Random Fields

Information Technology Reference

In-Depth Information

3.4 MRFsearch Ranking File

After searching the database a ranking

file will be generated as shown in Fig. 3.1 .

The

first line contains the query protein name. The second line shows the query

protein sequence. The third line is the query protein length. The NEFF (number of

effective sequence homologs) in the fourth line is the average Shannon

“

Sequence

Entropy

le. NEFF is the average number of amino

acid (AA) substitutions across all residues of a protein, ranging from 1 to 20 (i.e.,

the number of AA types). NEFF at one residue is calculated by exp

”

for a PSI-BLAST sequence pro

ð P k p k ln p k Þ

where p k is the probability for the kth AA type), and NEFF for the whole protein is

the average across all residues. Generally speaking, NEFF is used to quantify the

homologous information content available for a given protein. The larger the NEFF

value, the more homologous information its pro

le contains. The

fifth line contains

the number of proteins searched by MRFsearch.

The meaning of each column is explained as follows.

Column 1 ' No '

Ranking of all the searched proteins.

Column 2

'

Proteins

'

Name of the protein (PDB ID or SCOP protein name) in the

databases.

Column 3

'

P-value

'

The P-value of the alignment. The smaller, the better.

Column 4

'

Score

'

The alignment raw score between the query and subject

proteins.

Column 5

'

Node

'

The accumulative node alignment potential.

Fig. 3.1 An example ranking file generated by MRFsearch

Protein Homology Detection Through Alignment of Markov Random Fields

Search WWH ::

Custom Search

Home