Information Technology Reference
In-Depth Information
3.2.2 A Generalized Subsequence Kernel
Let
Σ
1
,Σ
2
, ..., Σ
k
be some disjoint feature spaces. Following the example in Sec-
tion 3.2.1,
Σ
1
could be the set of words,
Σ
2
the set of POS tags, etc. Let
Σ
×
=
Σ
1
× Σ
2
× ... × Σ
k
be the set of all possible feature vectors, where a fea-
ture vector would be associated with each position in a sentence. Given two feature
vectors
x, y ∈ Σ
×
,let
c
(
x, y
) denote the number of common features between
x
and
y
. The next notation follows that introduced in [2]. Thus, let
s, t
be two sequences
over the finite set
Σ
×
, and let
|s|
denote the length of
s
=
s
1
...s
|s|
. The sequence
s
[
i
:
j
] is the contiguous subsequence
s
i
...s
j
of
s
.Let
i
=(
i
1
, ..., i
|
i
|
) be a sequence of
|
i
|
indices in
s
, in ascending order. We define the length
l
(
i
)
of the index sequence
i
in s
as
i
|
i
|
− i
1
+ 1. Similarly,
j
is a sequence of
|
j
|
indices in
t
.
Let
Σ
∪
=
Σ
1
∪ Σ
2
∪ ... ∪ Σ
k
be the set of all possible features. We say that
the sequence
u ∈ Σ
∪
is a (sparse) subsequence of
s
if there is a sequence of
|u|
indices
i
such that
u
k
∈ s
i
k
, for all
k
=1
, ..., |u|
. Equivalently, we write
u ≺ s
[
i
]as
a shorthand for the component-wise '
∈
' relationship between
u
and
s
[
i
].
Finally, let
K
n
(
s, t, λ
) (Equation 3.1) be the number of weighted sparse subse-
quences
u
of length
n
common to
s
and
t
(i.e.,
u ≺ s
[
i
],
u ≺ t
[
j
]), where the weight
of
u
is
λ
l
(
i
)+
l
(
j
)
, for some
λ ≤
1.
K
n
(
s, t, λ
)=
u∈Σ
∪
λ
l
(
i
)+
l
(
j
)
(3.1)
i
:
u≺s
[
i
]
j
:
u≺t
[
j
]
Let
i
and
j
be two index sequences of length
n
. By definition, for every
k
between
1 and
n
,
c
(
s
i
k
,t
j
k
) returns the number of common features between
s
and
t
at
positions
i
k
and
j
k
.If
c
(
s
i
k
,t
j
k
) = 0 for some
k
, there are no common feature
sequences of length
n
between
s
[
i
] and
t
[
j
]. On the other hand, if
c
(
s
i
k
,t
j
k
) is greater
than 1, this means that there is more than one common feature that can be used
at position
k
to obtain a common feature sequence of length
n
. Consequently, the
number of common feature sequences of length
n
between
s
[
i
] and
t
[
j
], i.e., the size of
the set
{u ∈ Σ
n
∪
|u ≺ s
[
i
]
,u≺ t
[
j
]
}
,isgivenby
k
=1
c
(
s
i
k
,t
j
k
). Therefore,
K
n
(
s, t, λ
)
can be rewritten as in Equation 3.2:
K
n
(
s, t, λ
)=
i
:
|
i
|
=
n
c
(
s
i
k
,t
j
k
)
λ
l
(
i
)+
l
(
j
)
(3.2)
n
j
:
|
j
|
=
n
k
=1
We use
λ
as a decaying factor that penalizes longer subsequences. For sparse sub-
sequences, this means that wider gaps will be penalized more, which is exactly the
desired behavior for our patterns. Through them, we try to capture head-modifier
dependencies that are important for relation extraction; for lack of reliable depen-
dency information, the larger the word gap is between two words, the less confident
we are in the existence of a head-modifier relationship between them.
To enable an e
cient computation of
K
n
, we use the auxiliary function
K
n
with
a definition similar to
K
n
, the only difference being that it counts the length from
the beginning of the particular subsequence
u
to the end of the strings
s
and
t
,as
illustrated in Equation 3.3:
K
n
(
s, t, λ
)=
u∈Σ
∪
λ
|s|
+
|t|−i
1
−j
1
+2
(3.3)
i
:
u≺s
[
i
]
j
:
u≺t
[
j
]
Search WWH ::
Custom Search