Information Technology Reference
In-Depth Information
3.2.2 A Generalized Subsequence Kernel
Let Σ 1 2 , ..., Σ k be some disjoint feature spaces. Following the example in Sec-
tion 3.2.1, Σ 1 could be the set of words, Σ 2 the set of POS tags, etc. Let
Σ × = Σ 1 × Σ 2 × ... × Σ k be the set of all possible feature vectors, where a fea-
ture vector would be associated with each position in a sentence. Given two feature
vectors x, y ∈ Σ × ,let c ( x, y ) denote the number of common features between x and
y . The next notation follows that introduced in [2]. Thus, let s, t be two sequences
over the finite set Σ × , and let |s| denote the length of s = s 1 ...s |s| . The sequence
s [ i : j ] is the contiguous subsequence s i ...s j of s .Let i =( i 1 , ..., i | i | ) be a sequence of
| i | indices in s , in ascending order. We define the length l ( i ) of the index sequence i
in s as i | i | − i 1 + 1. Similarly, j is a sequence of | j | indices in t .
Let Σ = Σ 1 ∪ Σ 2 ∪ ... ∪ Σ k be the set of all possible features. We say that
the sequence u ∈ Σ is a (sparse) subsequence of s if there is a sequence of |u|
indices i such that u k ∈ s i k , for all k =1 , ..., |u| . Equivalently, we write u ≺ s [ i ]as
a shorthand for the component-wise ' ' relationship between u and s [ i ].
Finally, let K n ( s, t, λ ) (Equation 3.1) be the number of weighted sparse subse-
quences u of length n common to s and t (i.e., u ≺ s [ i ], u ≺ t [ j ]), where the weight
of u is λ l ( i )+ l ( j ) , for some λ ≤ 1.
K n ( s, t, λ )=
u∈Σ
λ l ( i )+ l ( j )
(3.1)
i : u≺s [ i ]
j : u≺t [ j ]
Let i and j be two index sequences of length n . By definition, for every k between
1 and n , c ( s i k ,t j k ) returns the number of common features between s and t at
positions i k and j k .If c ( s i k ,t j k ) = 0 for some k , there are no common feature
sequences of length n between s [ i ] and t [ j ]. On the other hand, if c ( s i k ,t j k ) is greater
than 1, this means that there is more than one common feature that can be used
at position k to obtain a common feature sequence of length n . Consequently, the
number of common feature sequences of length n between s [ i ] and t [ j ], i.e., the size of
the set {u ∈ Σ n
|u ≺ s [ i ] ,u≺ t [ j ] } ,isgivenby k =1 c ( s i k ,t j k ). Therefore, K n ( s, t, λ )
can be rewritten as in Equation 3.2:
K n ( s, t, λ )=
i : | i | =
n
c ( s i k ,t j k ) λ l ( i )+ l ( j )
(3.2)
n
j : | j | =
n
k =1
We use λ as a decaying factor that penalizes longer subsequences. For sparse sub-
sequences, this means that wider gaps will be penalized more, which is exactly the
desired behavior for our patterns. Through them, we try to capture head-modifier
dependencies that are important for relation extraction; for lack of reliable depen-
dency information, the larger the word gap is between two words, the less confident
we are in the existence of a head-modifier relationship between them.
To enable an e cient computation of K n , we use the auxiliary function K n with
a definition similar to K n , the only difference being that it counts the length from
the beginning of the particular subsequence u to the end of the strings s and t ,as
illustrated in Equation 3.3:
K n ( s, t, λ )=
u∈Σ
λ |s| + |t|−i 1 −j 1 +2
(3.3)
i : u≺s [ i ]
j : u≺t [ j ]
Search WWH ::




Custom Search