Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications - page 254

Database Reference

In-Depth Information

Features

Coarse

Fine

Question trigrams

91.2

77.6

All question q grams

87.2

71.8

All question unigrams

88.4

78.2

Question bigrams

91.6

79.4

+informer q-grams

94.0

82.4

+informer hypernyms

94.2

88.0

Question unigrams + all informer

93.4

88.0

Only informer

92.2

85.0

Question bigrams + hypernyms

91.6

79.4

FIGURE 10.11 : Percent accuracy with linear SVMs, “perfect” informer

spans and various feature encodings. The 'Coarse' column is for the 6 top-

level UIUC classes and the 'fine' column is for the 50 second-level classes.

10.3 Scoring Potential Answer Snippets

In Section 10.2 we established that atypes can be inferred from a natural

language question with high accuracy. The atype extraction step is an

important part of question preprocessing, because it lets us partition question

tokens into

•

Tokens that express the user's information need as a type to be

instantiated, but which need not literally appear in a correct response

document or snippet, and

•

Tokens that the user expects to literally match correct response

documents or snippets—we call these selector tokens.

For example, the question “What is the distance between Paris and Rome?”

gets partitioned into

•

Atype NUMBER:distance (UIUC system) or distance#n#3 (WordNet

system)

•

Selectors Paris and Rome thatcanbeusedtoshortlistdocumentsand

snippets that qualify to be scored

In this section we set up a machine learning framework to assign scores to

snippets that potentially answer the question.

In traditional Information Retrieval, the extent of match between the query

q and a candidate document d is often measured as the cosine of the angle

between q and d represented as vectors in the Vector Space Model (33). Each

word in the lexicon is represented by an axis in the vector space.

Words

Next Page

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home