Text Search-Enhanced with Types and Entities - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

In our snippet search application, a query may have a handful of positive

response snippets and the vast expanses of token segments elsewhere in the

corpus are negative examples. Intuitively, to train a good scoring function,

it is useless to pick obviously irrelevant snippets. In our experiments, we

picked negative snippets that contained at least one selector, and heuristically

preferred negative snippets that were most similar to positive ones. This

drastically cut down the number of negative snippets. However, the product

of the number of positive and negative snippets, which is the size of

above,

was still very large (see Section 10.3.3 ). With only 169662 snippets and

several million preference pairs, RankSVM executed millions of iterations

with hundreds of millions of kernel evaluations, and failed to terminate in a

day on a 3GHz CPU.

Optimization (10.4) can be written as

≺

2 β β + C

i≺j

max { 0 , 1 − ( β x j − β x i ) }

min

which, because e −t

≥

max

{

0 , 1

−

}

, can be bounded by

2 β β + C

i≺j

exp( β x j −

β x i )

min

(10.5)

We call this formulation RankExp. A somewhat better approximation to the

hinge loss max

{

0 , 1

−

}

is log(1 + e 1 −t ), leading to the optimization

2 β β + C

log(1 + exp( β x j −

β x i )) ,

min

≺

but we did not see practical differences in the accuracy of the learnt scoring

function. RankExp may be potentially less accurate than RankSVM, but

allows us to use simpler optimizers such as L-BFGS (28). Moreover, only

sequential scans are involved over the training data, which can therefore reside

on disk.

By modifying the model roughness penalty from

2 to something else,

we can encourage β to have some desirable properties. For example, because

elements of β correspond to token offsets, we may believe that adjacent

elements of β should not differ drastically.

This leads us to the modified

locally smooth formulation

β j +1 ) 2 + C

i≺j

exp( β x j −

β x i )

min

( β j −

(10.6)

j =1

where we can arbitrarily set β W +1 = 0, because any fixed offset to all β j leaves

the score unchanged.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home