Database Reference
In-Depth Information
Type
#Quest.
Heuristic
2-state
3-state
Informers
CRF
CRF
what
349
57.3
68.2
83.4
which
11
77.3
83.3
77.2
when
28
75.0
98.8
100.0
where
27
84.3
100.0
96.3
who
47
55.0
47.2
96.8
how *
32
90.6
88.5
93.8
rest
6
66.7
66.7
77.8
Total
500
62.4
71.2
86.7
FIGURE 10.10 : Effect of number of CRF states, and comparison with the
heuristic baseline (Jaccard accuracy expressed as %).
10.2.4.2
Question classification accuracy
Because our classification system is two-level (CRF followed by SVM), our
evaluation will also be in two stages. First, we will evaluate the accuracy
of the SVM assuming “perfect” (i.e., human-generated) informer spans are
available during both training and testing. Second, we will evaluate the more
realistic setting with the CRF providing the informer span.
Benefits from “perfect” informers: Figure 10.11 shows that the baseline
word unigram SVM is already quite competitive with the best previous
numbers, and exploiting perfect informer spans beats all known numbers.
It is clear that both informer q-grams and informer hypernyms are very
valuable features for question classification. The fact that no improvement
was obtained with question bigrams over using question hypernyms highlights
the importance of not using all question tokens uniformly, but recognizing
that some of them have a special role to play in predicting the atype.
Figure 10.12 is the final summary of this section. Column (a) shows the
performance of an SVM question classifier that does not use informers, but
uses only word bigrams and their hypernyms. Columns (b), (c) and (d) show
the accuracies obtained with only informer-based features. Column (b) uses
manually tagged “perfect” informers. Column (c) uses heuristic informers,
which often perform worse, especially for what and which questions. Informer
spans tagged by the CRF perform somewhere between perfect informers
and heuristic informers. However, columns (e), (f) and (g) show the best-
performing settings where informer features are used in conjunction with the
baseline features from all question bigrams and their hypernyms. Again, CRF-
tagged informers are somewhere between perfect and heuristic informers, but
closer to perfect informers on average.
 
Search WWH ::




Custom Search