Biology Reference
In-Depth Information
cases among all cases that actually belong to the relevant subset, while precision is
the fraction of correct cases among those that the algorithm believes to belong to
the relevant subset.
High recall is understood as not missing anything (however, it may involve
returning a lot of useless results, i.e. low precision). In contrast, high precision
describes a situation where all of the returned results are relevant, although not all
relevant results may have been returned (low recall).
F-measure is given by the following formula:
TP
F
measure
=
TP
+++
FP
TP
FN
It effectively integrates both contributory measures (precision and recall),
acknowledging the effect of TP, FP and FN:
TP
precision
=
FP
TP
+
TP
recall
=
TP
+
FN
Higher F-measure values indicate better (more accurate) solutions.
4.3.2
MCC
Another correctness measure sometimes applied in research is called MCC - the
Matthews Correlation Coefficient (Altman and Bland 1994 ; Baldi et al. 2000 ;
Matthews 1975 ; Carugo 2007 ). It is derived directly from the so-called confusion
matrix and given by the following formula:
(* )( * )
TP
TN
FP
FN
MCC
=
(
TP
+
FP
)*(
TP
+
FN
)*(
TN
+
FP
)*(
TN
+
FN
)
Sensitivity (also called recall rate) measures the proportion of actual positives
which are correctly identified as such (number of residues correctly recognized as
involved in ligand binding). In contrast, specificity is the fraction of correctly
identified negatives (number of residues correctly recognized as not involved in
ligand binding). It is worth noting that these coefficients closely correspond to the
concept of type I and type II errors.
4.3.3
ROC Curve - Receiver Operating Characteristic
Our comparative study is further augmented by ROC curve analysis (Fawcett 2006 ) .
 
Search WWH ::




Custom Search