Database Reference
In-Depth Information
testing in each fold) was used.
The next job was to turn contexts into feature vectors. Recall that there
must be at least one selector match within
W
tokens of the candidate
a
.We
set up this window with 2
W
+ 1 tokens centered at
a
, and retained only one
instance of each selector, the one closest to
a
. Left-right ties were broken
arbitrarily. Obviously, we can also aggregate over multiple occurrences of a
selector if
warrants.
10.3.3.2
RankExp performance scaling
[0
.
01
,
0
.
3] in (10.4) and (10.5), the fraction
of orderings satisfied by RankSVM and RankExp, as well as the MRRs were
typically within 3% of each other, while RankExp took 14-40 iterations or
10-20 minutes to train and RankSVM took between 2 and 24 hours. A more
detailed evaluation is shown in Figure 10.14.
On identical datasets, for
C
∈
1
0000
00
Exp,C=0.3
Exp,C=3
SVM
800000
600000
400000
200000
0
FractionTrainingSize
0
0.1
0.2
0.3
FIGURE 10.14
: Relative CPU times needed by RankSVM and RankExp
as a function of the number of ordering constraints.
10.3.3.3
Fitting the
decay
profile
optimizaton (10.5), where
β
j
gives the relative importance of a selector match
at gap
j
. On smoothing using the optimization in (10.6) instead, we get the
values shown as a continuous line. With a suitably cross-validated choice of
C
,thesmoothversionof
β
gave lower test error than the rough version.
We did not expect the clearly non-monotonic behavior near
j
=0,and
only in hindsight found that this is a property of language (perhaps already
appreciated by linguists): selectors are often named entities, and are often
connected to the answer token via prepositions and articles that creates a
gap. This goes against conventional wisdom that spreading activation should
monotonically decay with distance.
Search WWH ::
Custom Search