Database Reference
In-Depth Information
for the 50 fine-grained classes was surprisingly small. The authors explain
this in the following terms: “the syntactic tree does not normally contain the
information required to distinguish between the various fine categories within
a coarse category.”
10.2.1 Answer Type Clues in Questions
We contend that the above methods for generating features from the
question overload the learner with too many features too far from the critical
question tokens that reveal the richest clues to the atype.
In fact, our experiments show that a very short (typically 1-3 word)
subsequence of question tokens are adequate clues for question classification,
at least by humans. We call these segments informer spans . This is certainly
true of the most trivial atypes ( Who wrote Hamlet? or How many dogs pull a
sled at Iditarod?) but is also true of more subtle clues (How much does a rhino
weigh ?). Informal experiments revealed the surprising property that only one
segment is enough. In the above question, a human does not even need the
how much clue (which hints at only a generic quantity) once the word weigh is
available. In fact, “How much does a rhino cost ?” has an identical syntax but
an atype that is a completely different subtype of “quantity,” not revealed by
how much alone. The only exceptions to the single-span hypothesis are multi-
function questions like “What is the name and age of ... ,” which should be
assigned to multiple answer types. In this paper we consider questions where
one type suces.
Consider another question with multiple clues: Who is the CEO of IBM?
In isolation, the clue who merely tells us that the answer might be a person or
country or perhaps an organization, while CEO is perfectly precise, rendering
who unnecessary. All of the above applies a forteriori to what and which
clues, which are essentially uninformative on their own, as in “What is the
distance between Pisa and Rome?”
The informer span is very sensitive to the structure of clauses, phrases
and possessives in the question, as is clear from these examples (informers
italicized): “What is Bill Clinton's wife's profession ,” and “What country 's
president was shot at Ford's Theater.” Depending on sentence structure, the
informer can be near to or far from question triggers like what , which and
how .
The choice of informer spans also depends on the target classification
system. Initially we wished to handle definition questions separately, and
marked no informer tokens in “What is digitalis.” However, what is is an
excellent informer for the UIUC question class marked “definition” DESC:def .
Before we get into the job of annotating the question with the informer
segment, we summarize the accuracy obtained by some of the approaches
reviewed earlier, as well as by a linear SVM that was provided with suitable
features generated from the informer segment (details in Section 10.2.3). If
“perfect” informer spans are labeled by hand, and features generated only
Search WWH ::




Custom Search