Database Reference
In-Depth Information
In our snippet search application, a query may have a handful of positive
response snippets and the vast expanses of token segments elsewhere in the
corpus are negative examples. Intuitively, to train a good scoring function,
it is useless to pick obviously irrelevant snippets. In our experiments, we
picked negative snippets that contained at least one selector, and heuristically
preferred negative snippets that were most similar to positive ones. This
drastically cut down the number of negative snippets. However, the product
of the number of positive and negative snippets, which is the size of
above,
was still very large (see Section 10.3.3 ). With only 169662 snippets and
several million preference pairs, RankSVM executed millions of iterations
with hundreds of millions of kernel evaluations, and failed to terminate in a
day on a 3GHz CPU.
Optimization (10.4) can be written as
2 β β + C
i≺j
1
max { 0 , 1 ( β x j − β x i ) }
min
β
which, because e −t
max
{
0 , 1
t
}
, can be bounded by
2 β β + C
i≺j
1
exp( β x j
β x i )
min
β
(10.5)
We call this formulation RankExp. A somewhat better approximation to the
hinge loss max
{
0 , 1
t
}
is log(1 + e 1 −t ), leading to the optimization
2 β β + C
i
log(1 + exp( β x j
β x i )) ,
1
min
β
j
but we did not see practical differences in the accuracy of the learnt scoring
function. RankExp may be potentially less accurate than RankSVM, but
allows us to use simpler optimizers such as L-BFGS (28). Moreover, only
sequential scans are involved over the training data, which can therefore reside
on disk.
By modifying the model roughness penalty from
2 to something else,
we can encourage β to have some desirable properties. For example, because
elements of β correspond to token offsets, we may believe that adjacent
elements of β should not differ drastically.
β
This leads us to the modified
locally smooth formulation
W
β j +1 ) 2 + C
i≺j
exp( β x j
β x i )
min
β
( β j
(10.6)
j =1
where we can arbitrarily set β W +1 = 0, because any fixed offset to all β j leaves
the score unchanged.
Search WWH ::




Custom Search