Information Technology Reference
In-Depth Information
Column 6
'
Edge
'
The accumulative edge alignment potential.
Column 7
'
qRange
'
Range of aligned region in the query protein.
Column 8
'
tRange
'
Range of aligned region in the subject protein.
Column 9
'
tLength
'
The length of the subject protein.
Column 10
'
Cols
'
The number of aligned positions in the pairwise alignment.
Column 11
'
#tGaps
'
The number of gaps in the subject protein (Insertions).
Column 12
'
#qGaps
'
The number of gaps in the query protein (Deletions).
Column 13
'
#seqID
'
The number of identical residues in the alignment.
3.5 Interpreting P-Value
In the ranking
dence score indicating the
relative quality of the top-ranked proteins and (corresponding) alignments. To cal-
culate the P-value, we employs a set of
file, P-value can be interpreted as a con
reference proteins (in
databases/
CAL_TGT)
1,800 single-domain proteins belonging to dif-
ferent SCOP folds. Given a query protein, we
), which consists of
*
first align it to this reference protein
database and then estimate an extreme value distribution from the
1,800 alignment
scores. Based upon this distribution, we calculate the P-value of each alignment
when aligning the query protein to the subject protein database. The P-value actually
measures the likelihood of each subject protein being homologous to the query
protein by comparing it to the reference proteins.
To see the relationship between the P-value and the closeness of the
*
rst-ranked
protein by MRFsearch to a query protein, we conduct an experiment on the
368 CAMEO target proteins. For each CAMEO target, the
first-ranked protein in
the database is treated as the homolog of this target. To measure the quality of an
alignment, we use un-normalized Global Distance Test (GDT). GDT has been
employed as an of
cial measure of a protein model quality by CASP for many
years. When applied to alignments, uGDT can be interpreted as the number of
correctly-aligned positions in an alignment, but weighted by alignment quality at
each position. We say one alignment is good when its uGDT is larger than 50. We
use 50 as a cutoff because that many proteins similar at only the fold level have
uGDT around 50. Figure 3.2 shows the relationship between P-value and uGDT on
the 368 CAMEO targets. Figure 3.3 is a zoom-in graph of Fig. 3.2 , showing
relationship between P-value and uGDT on the 132 CAMEO targets with
log(P-
value) < 20. As shown in Fig. 3.3 , when P-value is small (i.e. <10e
10), most
alignments have uGDT greater than or equal to 50. That is, when P-value is less
than 10e
first-ranked protein is very likely to share a similar fold as the
query protein. When P-value is between 10e
10, the
5 and 10e
10, more than half of the
alignments have uGDT > 50.
Search WWH ::




Custom Search