Information Technology Reference
In-Depth Information
Column 6
The accumulative edge alignment potential.
Column 7
Range of aligned region in the query protein.
Column 8
Range of aligned region in the subject protein.
Column 9
The length of the subject protein.
Column 10
The number of aligned positions in the pairwise alignment.
Column 11
The number of gaps in the subject protein (Insertions).
Column 12
The number of gaps in the query protein (Deletions).
Column 13
The number of identical residues in the alignment.
3.5 Interpreting P-Value
In the ranking
dence score indicating the
relative quality of the top-ranked proteins and (corresponding) alignments. To cal-
culate the P-value, we employs a set of
file, P-value can be interpreted as a con
reference proteins (in
1,800 single-domain proteins belonging to dif-
ferent SCOP folds. Given a query protein, we
), which consists of
first align it to this reference protein
database and then estimate an extreme value distribution from the
1,800 alignment
scores. Based upon this distribution, we calculate the P-value of each alignment
when aligning the query protein to the subject protein database. The P-value actually
measures the likelihood of each subject protein being homologous to the query
protein by comparing it to the reference proteins.
To see the relationship between the P-value and the closeness of the
protein by MRFsearch to a query protein, we conduct an experiment on the
368 CAMEO target proteins. For each CAMEO target, the
first-ranked protein in
the database is treated as the homolog of this target. To measure the quality of an
alignment, we use un-normalized Global Distance Test (GDT). GDT has been
employed as an of
cial measure of a protein model quality by CASP for many
years. When applied to alignments, uGDT can be interpreted as the number of
correctly-aligned positions in an alignment, but weighted by alignment quality at
each position. We say one alignment is good when its uGDT is larger than 50. We
use 50 as a cutoff because that many proteins similar at only the fold level have
uGDT around 50. Figure 3.2 shows the relationship between P-value and uGDT on
the 368 CAMEO targets. Figure 3.3 is a zoom-in graph of Fig. 3.2 , showing
relationship between P-value and uGDT on the 132 CAMEO targets with
value) < 20. As shown in Fig. 3.3 , when P-value is small (i.e. <10e
10), most
alignments have uGDT greater than or equal to 50. That is, when P-value is less
than 10e
first-ranked protein is very likely to share a similar fold as the
query protein. When P-value is between 10e
10, the
5 and 10e
10, more than half of the
alignments have uGDT > 50.
Search WWH ::

Custom Search