Database Reference
In-Depth Information
in tagging, the SVM is given a chance to correlate these mistakes to the true
label. In contrast, in the first approach, the SVM may see test data that
is distributionally different from the training data, and training data is of
higher quality because the informer spans are human-generated. For these
reasons, we implemented the second option. We have anecdotal evidence that
the accuracy of the second approach is somewhat higher, because we subject
the SVM to the limitations of the CRF output uniformly during both training
and testing.
The SVM used is a linear multi-class one-vs-one SVM 2 ,asintheZhang
and Lee (40) baseline. We do not use ECOC (16) because the reported gain is
less than 1%. Through tuning, we found that the SVM “ C ” parameter (used
to trade between training data fit and model complexity) must be set to 300
to achieve published baseline numbers.
10.2.3.1
Informer q -gram features
Our main modification to earlier SVM-based approaches is in generating
features from informers. In earlier work, word features were generated from
word q -grams. We can apply the same method to the informer span, e.g.,
for the question “What is the height of Mount Everest?” where height is the
informer span, we generate a feature corresponding to height . (We will also
generate regular word features; therefore we have to tag the features so that
'height' occurring inside the informer span generates a distinct feature from
'height' occurring outside the informer span.)
As in regular text classification, the goal is to reveal to the learner
important correlations between informer features and question classes, e.g.,
the UIUC label system has a class called NUMBER:distance . We would expect
informers like length or height to be strongly correlated with the class label
NUMBER:distance .
10.2.3.2
Informer hypernym features
Another set of features generated from informer tokens proves to be
valuable. The class label NUMBER:distance is correlated with a number of
potential informer q -grams, such as height , how far , how long , how many
miles , etc. In an ideal setting, given very large amounts of labeled data, all
such correlations can be learnt automatically. In real life, training data is
limited. As a second example, the UIUC label system has a single coarse-
grained class called HUMAN:individual , whereas questions may use diverse
atype informer tokens like author , cricketer or CEO .
There are prebuilt databases such as WordNet (30) where explicit
hypernym-hyponym ( x is a kind of y ) relations are cataloged as a directed
acyclic graph of types. For example, author , cricketer , CEO would all connect
2 http://www.csie.ntu.edu.tw/ ~ cjlin/libsvm/
Search WWH ::




Custom Search