Biology Reference
In-Depth Information
cases among all cases that
actually
belong to the relevant subset, while precision is
the fraction of correct cases among those that the algorithm
believes
to belong to
the relevant subset.
High recall is understood as not missing anything (however, it may involve
returning a lot of useless results, i.e. low precision). In contrast, high precision
describes a situation where all of the returned results are relevant, although not all
relevant results may have been returned (low recall).
F-measure is given by the following formula:
TP
F
−
measure
=
TP
+++
FP
TP
FN
It effectively integrates both contributory measures (precision and recall),
acknowledging the effect of TP, FP and FN:
TP
precision
=
FP
TP
+
TP
recall
=
TP
+
FN
Higher F-measure values indicate better (more accurate) solutions.
4.3.2
MCC
Another correctness measure sometimes applied in research is called MCC - the
Matthews Correlation Coefficient (Altman and Bland
1994
; Baldi et al.
2000
;
Matthews
1975
; Carugo
2007
). It is derived directly from the so-called confusion
matrix and given by the following formula:
(* )( * )
TP
TN
−
FP
FN
MCC
=
(
TP
+
FP
)*(
TP
+
FN
)*(
TN
+
FP
)*(
TN
+
FN
)
Sensitivity (also called recall rate) measures the proportion of actual positives
which are correctly identified as such (number of residues correctly recognized as
involved in ligand binding). In contrast, specificity is the fraction of correctly
identified negatives (number of residues correctly recognized as not involved in
ligand binding). It is worth noting that these coefficients closely correspond to the
concept of type I and type II errors.
4.3.3
ROC Curve - Receiver Operating Characteristic
Our comparative study is further augmented by ROC curve analysis (Fawcett
2006
) .
Search WWH ::
Custom Search