Background: Corpora and Evaluation Methods - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

2.2.1 PRECISION, RECALL AND F-SCORES

Many of the mining and summarization techniques described in this topic are supervised binary

supervised

classifiers, where supervsied means that the classifier requires training on labeled data and binary

means we are predicting one of two classes. For example, a classifier that discriminates subjective

from non-subjective comments or informative from non-informative sentences may be trained on

data where each sentence has been labeled as belonging to one of the two classes. In other words,

with all of these tasks we are trying to discern a positive class from a negative class. In these cases,

we can evaluate the classifier using precision, recall and F-score. Precision and recall are calculated

as follows:

precision

and recall

Precision

= T P /(T P + FP)

Recall

= T P /(T P + FN) ,

where TP means true positives (correctly classified as positive), FP means false positives (incorrectly

classified as positive), and FN means false negatives (incorrectly classified as negative). Note that

these two measurements share the same numerator, TP, the number of items correctly classified as

positive. To get precision, we divide the numerator by the number of items that were predicted to be

positive. To get recall, we divide the numerator by the number of items that really are positive. A

perfect classifier would have both precision and recall equal to 1, as FP and FN would be equal to 0

(i.e., no data would be incorrectly classified as positive or negative).

The F-score is simply a combination of precision and recall. The harmonic mean is typically

F-score

used, which is given by the following equation when precision and recall are weighted equally:

2 ∗

Precision

∗

Recall

F

=

.

Precision

+

Recall

The perfect classifier would have an F-score equal to 1.

2.2.2 ROC CURVES

Many of the mining and summarization techniques described in this topic rely on probabilistic binary

classifiers, which assign to each data instance (e.g., a sentence) a posterior probability of belonging to

posterior

probability

a certain class, given the evidence, e.g., the sentence features used.

When calculating precision, recall and F-score for a probabilistic classifier, we evaluate the

classifier at a particular posterior probability threshold, where we consider a data instance to be

“positive”, i.e., to belong to the class, if the classifier's posterior probability for that particular instance

is greater than or equal to a threshold and “negative” otherwise. A commonly used threshold is 0.5,

the mid point of the [0, 1] probability range.

Arguably, a more informative alternative is to evaluate the classifier across all possible proba-

bility thresholds between 0 and 1. In practice, we can measure the true-positive/false-positive rates

as the posterior threshold is varied. The true-positive rate (TPR) and false-positive rate (FPR) are

calculated as follows:

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home