Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

In standard IR, other things being equal, if a query term occurs more

frequently in document d 1 than d 2 , d 1 gets a somewhat larger score than

d 2 . In our setting, it is unclear if multiple occurrences of a selector should

activate the candidate position any more than a single occurrence. In our

experiments, we simply ignored all but the nearest occurrence of each selector,

in effect, setting

because even a

low-IDF selector can boost the score of a non-answer candidate token if it

appears a few times near the candidate. Apart from max and Σ, it might

be worthwhile experimenting with very slow-growing functions of the selector

multiplicity. For

to max. Sum (Σ) behaves poorly as

, sum performs quite well, i.e., we add the activation from

different selectors. Here, too, some extent on non-linearity may be worthwhile

exploring.

⊕

10.3.2 Learning the Proximity Scoring Function

For simplicity, we will limit our attention to the W tokens to the left and

right of the candidate position numbered 0 in Figure 10.13 . If the word/term

at offset o is t o , we can rewrite (10.2) as

β o = β x

score ( a )=

energy ( t o ) nearest? ( t o ,o,a )

(10.3)

o =

−

= x o

where nearest? ( t, o, a ) is 1 if the nearest occurrence of word t to candidate

a is at offset o , and 0 otherwise. Ties are broken arbitrarily. In the final

dot-product form, x, β

2 W +1 .

In our implementation we made a few further simplifications. First, we

prevented the candidate token from endorsing itself, even if it was also a

selector. Consider the question “Which person designed the Panama Canal?”

with atype person#n#1 . We are certainly not interested in an answer token

person . Therefore, o = 0 is excluded from the sum above. Second, we ignore

the distinction between tokens to the left and right of a , i.e., constrain β −o =

β o ,andaddup x −o and x o suitably. This means, in our implementation,

x, β

∈ R

W .

Suppose x + is the feature vector corresponding to a snippet where position

a is indeed an answer to the query. Let x − be a feature vector representing

a snippet that does not contain an answer. Then we want our scoring model

β to satisfy β x + >β x − . Suppose relevance feedback is available in the

form of a set of preference pairs i

∈ R

j , meaning that the candidate position i

should appear lower in the ranked list than position j . Thisisnowsimilarto

Joachim's RankSVM setting (21), and we can use his SVM formulation:

≺

2 β β + C

v : β x i +1 <β x j + s ij

min

s≥

s ij

s.t.

∀

≺

(10.4)

0 ,β

≺

As with support vector classifiers, C is a tuned parameter that trades off the

model complexity

against violations of the snippet ordering requirements.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home