Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications - page 251

Database Reference

In-Depth Information

Agreement between the volunteers was almost perfect.

We will call these

designated informer spans “perfect” informers.

10.2.4.1

Informer span tagging accuracy

Each question has a known set I k of informer tokens, and gets a set

of tokens I c flagged as informers by the CRF. For each question, we can

grant ourselves a reward of 1 if I c = I k , and 0 otherwise. This strict

equality check can be harsh, because the second-level SVM classifier may

well classify correctly despite small perturbations in the feature bag derived

from informers. In Section 10.2.3.1, informer-based features were placed in a

separate bag. Therefore, the overlap between I c and I k would be a reasonable

predictor of question classification accuracy. We use the Jaccard similarity

|

I k ∩

I c |

/

|

I k ∪

I c |

.

Fraction

Jaccard

Features used

I c = I k

overlap

IsTag

0.368

0.396

+IsNum

0.474

0.542

+IsPrevTag+IsNextTag

0.692

0.751

+IsEdge+IsBegin+IsEnd

0.848

0.867

FIGURE 10.8 : Effect of feature choices.

Feature ablation study: Figure 10.8 shows the effect of using diverse

feature sets on the accuracy of the SVM, measured both ways. We make

the following observations:

•

By themselves, IsTag features are quite inadequate at producing

acceptable accuracy.

•

IsNum features improve accuracy 10-20%.

• IsPrevTag and IsNextTag (“+Prev +Next”) add over 20% of accuracy.

• IsEdge transition features help exploit Markovian dependencies and add

another 10-15% accuracy, showing that sequential models are indeed

required.

Benefits from non-local chunk features: We have commented before on

the potential benefits from our feature design procedure in Section 10.2.2.1.

To test if such an elaborate procedure is actually beneficial, we limited the

number of levels from Figure 10.5 that were converted into CRF features.

Figure 10.9 shows the results. “1” corresponds to features generated from

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home