Information Technology Reference
In-Depth Information
5.3 Syntactic Tree Kernels (STK)
Convolution tree kernels [2] compute the similarity between two trees T 1 and
T 2 by counting the common sub-trees, without enumerating the whole fragment
space. In more detail, let N 1 and N 2 be the set of nodes in T 1 and T 2 , respectively.
Moreover, let I i ( n ) be an indicator variable that is 1 if subtree i is rooted at n
and 0 otherwise. Then the convolution kernel K over T 1and T 2 is computed as:
STK ( T 1 ,T 2) =
n 1 ∈N 1 ,n 2 ∈N 2
Δ ( n 1 ,n 2 )
(4)
where
Δ ( n 1 ,n 2 )=
n 1 ∈N 1
I i ( n 1 ) I i ( n 2 )
i
n 2 ∈N 2
is computed eciently using the following recursive definition:
- If the production rules 4 at n 1 and n 2 are different, then Δ ( n 1 ,n 2 )=0.
- If the production rules at n 1 and n 2 are the same and n 1 and n 2 are pre-
terminals, then Δ ( n 1 ,n 2 )= λ .
- If the production rules at n 1 and n 2 are the same and n 1 and n 2 are not
pre-terminals, then:
nc ( n 1)
Δ ( n 1 ,n 2 )= λ
(1 + Δ ( ch ( n 1 ,j )) ,ch ( n 2 ,j ))
j =1
where nc ( n 1 ) is the number of children of n 1 inthetreeandthej-thchildren
of node n i is denoted by ch ( n i ,j ) (note that nc ( n 1 )= nc ( n 2 ) since the
production rule is the same). λ (0 <λ< 1) is a decay factor to make the
kernel less variable with respect to tree-fragment sizes.
5.4 Kernel Combination for Pairs
We need to represent the members of a pair and their interdependencies. For
this purpose, given two kernel functions, k 1 ( ., . )and k 2 ( ., . ), and two pairs, p 1 =
n 1 , s 1
and p 2 =
n 2 , s 2
, a first approximation is given by summing the kernels
applied to the components: K ( p 1 ,p 2 )= k 1 ( n 1 ,n 2 )+ k 2 ( s 1 ,s 2 ). This kernel will
produce the union of the feature spaces of questions and queries. A more effective
kernel is the product k ( n 1 ,n 2 ) × k ( s 1 ,s 2 ), since it generates pairs of fragments,
which are member of the Cartesian product of kernel spaces of the questions
and queries. As additional feature and kernel engineering, we also exploit the
ability of the polynomial kernel to add feature conjunctions. By simply applying
the function (1 + K ( p 1 ,p 2 )) d , we can generate conjunction up to d features.
Thus, we can obtain tree fragment conjunctions and conjunctions of pairs of
tree fragments.
The next section will show how to use such kernels for an SVM-based reranker.
4 In a syntactic tree a node with its children correspond to a production rule of the
grammar that generated it.
 
Search WWH ::




Custom Search