Database Reference
In-Depth Information
10.1.3 Ecient Indexing and Query Processing
Having decided on a ranking function, the third problem is to build indexes
and design a query-processing algorithm. The scoring paradigm indicated
above leads to an interesting performance trade-off. We can expand the query
atype to all ground instantiations, but this will be very expensive, especially
for very broad atypes. Or we can index all atype ancestors of each token, but
that will lead to unacceptable bloating of the index. Can we hit a practical
middle ground? That is the topic of Section 10.4.
Figure 10.2 shows our overall system.
The modules with heavy dotted
outlines are described at length here.
10.1.4 Comparison with Prior Work
Related work exists in several areas: question answering (QA), information
retrieval (IR) and databases (DB). The key difference from standard QA
systems is that we are not after a black-box solution; instead, we wish
to approximately “translate” well-formed questions into a semi-structured
form, and then give precise semantics for executing this form of semi-
structured queries. The notion of an atype appears often in the QA literature.
Meanwhile, many projects in the IR and DB communities deal with fast top-
k queries over feature vectors or tuples, but they do not consider lexical
proximity. XML search systems need to support path reachability queries,
but we know of no system that integrates reachability with lexical proximity
and supports a graceful trade-off between index space and query time.
10.2 Understanding the Question
Well-formed questions that seek a single entity or attribute of a given
type can be a great help to the search engine, as compared to 2-3 word
“telegraphic” queries.
Most successful QA systems first map the question to one or few
likely atype. This step is called “question classification” or “answer type
identification.” The answer type is usually picked from a hand-built taxonomy
having dozens to hundreds of answer types (17; 18; 25; 41; 13).
There are two major approaches to question classification. Earlier, rule-
based classification was used. A manually-constructed set of rules mapped
the question to a type. The rules exploited clues such as the wh-word (who,
where, when, how many) and the head of noun phrases associated with the
main verb (what is the tallest mountain in ...). Rule-based systems are
dicult to maintain and can be brittle.
More recently, question classification, following other prominent tasks in
 
Search WWH ::




Custom Search