Database Reference
In-Depth Information
Features
Coarse
Fine
Question trigrams
91.2
77.6
All question q grams
87.2
71.8
All question unigrams
88.4
78.2
Question bigrams
91.6
79.4
+informer q-grams
94.0
82.4
+informer hypernyms
94.2
88.0
Question unigrams + all informer
93.4
88.0
Only informer
92.2
85.0
Question bigrams + hypernyms
91.6
79.4
FIGURE 10.11 : Percent accuracy with linear SVMs, “perfect” informer
spans and various feature encodings. The 'Coarse' column is for the 6 top-
level UIUC classes and the 'fine' column is for the 50 second-level classes.
10.3 Scoring Potential Answer Snippets
In Section 10.2 we established that atypes can be inferred from a natural
language question with high accuracy. The atype extraction step is an
important part of question preprocessing, because it lets us partition question
tokens into
Tokens that express the user's information need as a type to be
instantiated, but which need not literally appear in a correct response
document or snippet, and
Tokens that the user expects to literally match correct response
documents or snippets—we call these selector tokens.
For example, the question “What is the distance between Paris and Rome?”
gets partitioned into
Atype NUMBER:distance (UIUC system) or distance#n#3 (WordNet
system)
Selectors Paris and Rome thatcanbeusedtoshortlistdocumentsand
snippets that qualify to be scored
In this section we set up a machine learning framework to assign scores to
snippets that potentially answer the question.
In traditional Information Retrieval, the extent of match between the query
q and a candidate document d is often measured as the cosine of the angle
between q and d represented as vectors in the Vector Space Model (33). Each
word in the lexicon is represented by an axis in the vector space.
Words
 
Search WWH ::




Custom Search