Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

10.1.3 Ecient Indexing and Query Processing

Having decided on a ranking function, the third problem is to build indexes

and design a query-processing algorithm. The scoring paradigm indicated

above leads to an interesting performance trade-off. We can expand the query

atype to all ground instantiations, but this will be very expensive, especially

for very broad atypes. Or we can index all atype ancestors of each token, but

that will lead to unacceptable bloating of the index. Can we hit a practical

middle ground? That is the topic of Section 10.4.

Figure 10.2 shows our overall system.

The modules with heavy dotted

outlines are described at length here.

10.1.4 Comparison with Prior Work

Related work exists in several areas: question answering (QA), information

retrieval (IR) and databases (DB). The key difference from standard QA

systems is that we are not after a black-box solution; instead, we wish

to approximately “translate” well-formed questions into a semi-structured

form, and then give precise semantics for executing this form of semi-

structured queries. The notion of an atype appears often in the QA literature.

Meanwhile, many projects in the IR and DB communities deal with fast top-

k queries over feature vectors or tuples, but they do not consider lexical

proximity. XML search systems need to support path reachability queries,

but we know of no system that integrates reachability with lexical proximity

and supports a graceful trade-off between index space and query time.

10.2 Understanding the Question

Well-formed questions that seek a single entity or attribute of a given

type can be a great help to the search engine, as compared to 2-3 word

“telegraphic” queries.

Most successful QA systems first map the question to one or few

likely atype. This step is called “question classification” or “answer type

identification.” The answer type is usually picked from a hand-built taxonomy

having dozens to hundreds of answer types (17; 18; 25; 41; 13).

There are two major approaches to question classification. Earlier, rule-

based classification was used. A manually-constructed set of rules mapped

the question to a type. The rules exploited clues such as the wh-word (who,

where, when, how many) and the head of noun phrases associated with the

main verb (what is the tallest mountain in ...). Rule-based systems are

dicult to maintain and can be brittle.

More recently, question classification, following other prominent tasks in

Search WWH ::

Custom Search

Home