Extracting Relations from Text: From Word Sequences to Dependency Paths - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

3.2.2 A Generalized Subsequence Kernel

Let Σ 1 ,Σ 2 , ..., Σ k be some disjoint feature spaces. Following the example in Sec-

tion 3.2.1, Σ 1 could be the set of words, Σ 2 the set of POS tags, etc. Let

Σ × = Σ 1 × Σ 2 × ... × Σ k be the set of all possible feature vectors, where a fea-

ture vector would be associated with each position in a sentence. Given two feature

vectors x, y ∈ Σ × ,let c ( x, y ) denote the number of common features between x and

y . The next notation follows that introduced in [2]. Thus, let s, t be two sequences

over the finite set Σ × , and let |s| denote the length of s = s 1 ...s |s| . The sequence

s [ i : j ] is the contiguous subsequence s i ...s j of s .Let i =( i 1 , ..., i | i | ) be a sequence of

| i | indices in s , in ascending order. We define the length l ( i ) of the index sequence i

in s as i | i | − i 1 + 1. Similarly, j is a sequence of | j | indices in t .

Let Σ ∪ = Σ 1 ∪ Σ 2 ∪ ... ∪ Σ k be the set of all possible features. We say that

the sequence u ∈ Σ ∪ is a (sparse) subsequence of s if there is a sequence of |u|

indices i such that u k ∈ s i k , for all k =1 , ..., |u| . Equivalently, we write u ≺ s [ i ]as

a shorthand for the component-wise ' ∈ ' relationship between u and s [ i ].

Finally, let K n ( s, t, λ ) (Equation 3.1) be the number of weighted sparse subse-

quences u of length n common to s and t (i.e., u ≺ s [ i ], u ≺ t [ j ]), where the weight

of u is λ l ( i )+ l ( j ) , for some λ ≤ 1.

K n ( s, t, λ )=

u∈Σ ∪

λ l ( i )+ l ( j )

(3.1)

i : u≺s [ i ]

j : u≺t [ j ]

Let i and j be two index sequences of length n . By definition, for every k between

1 and n , c ( s i k ,t j k ) returns the number of common features between s and t at

positions i k and j k .If c ( s i k ,t j k ) = 0 for some k , there are no common feature

sequences of length n between s [ i ] and t [ j ]. On the other hand, if c ( s i k ,t j k ) is greater

than 1, this means that there is more than one common feature that can be used

at position k to obtain a common feature sequence of length n . Consequently, the

number of common feature sequences of length n between s [ i ] and t [ j ], i.e., the size of

the set {u ∈ Σ n

∪ |u ≺ s [ i ] ,u≺ t [ j ] } ,isgivenby k =1 c ( s i k ,t j k ). Therefore, K n ( s, t, λ )

can be rewritten as in Equation 3.2:

K n ( s, t, λ )=

i : | i | =

c ( s i k ,t j k ) λ l ( i )+ l ( j )

(3.2)

j : | j | =

k =1

We use λ as a decaying factor that penalizes longer subsequences. For sparse sub-

sequences, this means that wider gaps will be penalized more, which is exactly the

desired behavior for our patterns. Through them, we try to capture head-modifier

dependencies that are important for relation extraction; for lack of reliable depen-

dency information, the larger the word gap is between two words, the less confident

we are in the existence of a head-modifier relationship between them.

To enable an e cient computation of K n , we use the auxiliary function K n with

a definition similar to K n , the only difference being that it counts the length from

the beginning of the particular subsequence u to the end of the strings s and t ,as

illustrated in Equation 3.3:

K n ( s, t, λ )=

u∈Σ ∪

λ |s| + |t|−i 1 −j 1 +2

(3.3)

i : u≺s [ i ]

j : u≺t [ j ]

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home