Database Reference
In-Depth Information
testing in each fold) was used.
The next job was to turn contexts into feature vectors. Recall that there
must be at least one selector match within W tokens of the candidate a .We
set up this window with 2 W + 1 tokens centered at a , and retained only one
instance of each selector, the one closest to a . Left-right ties were broken
arbitrarily. Obviously, we can also aggregate over multiple occurrences of a
selector if
warrants.
10.3.3.2
RankExp performance scaling
[0 . 01 , 0 . 3] in (10.4) and (10.5), the fraction
of orderings satisfied by RankSVM and RankExp, as well as the MRRs were
typically within 3% of each other, while RankExp took 14-40 iterations or
10-20 minutes to train and RankSVM took between 2 and 24 hours. A more
detailed evaluation is shown in Figure 10.14.
On identical datasets, for C
1 0000 00
Exp,C=0.3
Exp,C=3
SVM
800000
600000
400000
200000
0
FractionTrainingSize
0
0.1
0.2
0.3
FIGURE 10.14 : Relative CPU times needed by RankSVM and RankExp
as a function of the number of ordering constraints.
10.3.3.3
Fitting the decay profile
The scatter of dots in Figure 10.15 shows a typical β vector obtained from
optimizaton (10.5), where β j gives the relative importance of a selector match
at gap j . On smoothing using the optimization in (10.6) instead, we get the
values shown as a continuous line. With a suitably cross-validated choice of
C ,thesmoothversionof β gave lower test error than the rough version.
We did not expect the clearly non-monotonic behavior near j =0,and
only in hindsight found that this is a property of language (perhaps already
appreciated by linguists): selectors are often named entities, and are often
connected to the answer token via prepositions and articles that creates a
gap. This goes against conventional wisdom that spreading activation should
monotonically decay with distance.
 
Search WWH ::




Custom Search