Database Reference
In-Depth Information
Second-closest
stem “invent”
Closest
stem
“invent”
person
is-a
6
5
4
3
2
1
0
+1
+2
Candidate position to score
Selectors
FIGURE 10.13 (SEE COLOR INSERT FOLLOWING PAGE 130.) :
Setting up the proximity scoring problem.
document frequency or IDF standard in IR: the number N of documents
in the corpus divided by the number N s of documents containing the selector
token s . This is a linear form of IDF. We implemented the more commonly
used logarithmic form log(1 + N/N s ).
In many graph-based scoring systems such as ObjectRank (3), XRank
(15) or TeXQuery (1) it is common to use a monotone decreasing parametric
form decay ( g )= δ g ,where0 <δ< 1 is a magic decay factor. In Figure 10.13,
decay ( g ) is shown as a strictly decreasing function. However, as we shall see,
other shapes of decay (
·
) may match data more closely.
10.3.1.2
Aggregating over many selectors
Next we need to decide how to aggregate the activation from more than
one distinct selector or more than one occurrence of a selector. A selector s
can appear multiple times near a candidate; we call this set
{
s i }
.If a is the
candidate, our generic scoring function looks like
score ( a )=
s i energy ( s ) decay ( gap ( s i ,a )) ,
(10.1)
aggregates over multiple occurrences of s and
where
aggregates over
different selectors. If
distributes over multiplication, we can write
s energy ( s )
i decay ( gap ( s i ,a )) .
score ( a )=
(10.2)
 
Search WWH ::




Custom Search