Databases Reference
In-Depth Information
Example 9.2 Nonlinear transformation of original input data into a higher dimensional space.
Consider the following example. A 3-D input vector X D.
x 1 , x 2 , x 3 /
is mapped into
a 6-D space, Z , using the mappings
1 .
X
/D x 1 ,
2 .
X
/D x 2 ,
3 .
X
/D x 3 ,
4 .
X
/D
2 ,
.
x 1 /
5 .
X
/D x 1 x 2 , and
6 .
X
/D x 1 x 3 . A decision hyperplane in the new space is
d
/D WZ C b , where W and Z are vectors. This is linear. We solve for W and
b and then substitute back so that the linear decision hyperplane in the new ( Z )
space corresponds to a nonlinear second-order polynomial in the original 3-D input
space:
.
Z
2 C w 5 x 1 x 2 C w 6 x 1 x 3 C b
D w 1 z 1 C w 2 z 2 C w 3 z 3 C w 4 z 4 C w 5 z 5 C w 6 z 6 C b .
d
.
Z
/D w 1 x 1 C w 2 x 2 C w 3 x 3 C w 4 .
x 1 /
But there are some problems. First, how do we choose the nonlinear mapping to
a higher dimensional space? Second, the computation involved will be costly. Refer to
Eq. (9.19) for the classification of a test tuple, X T . Given the test tuple, we have to com-
pute its dot product with every one of the support vectors. 3 In training, we have to
compute a similar dot product several times in order to find the MMH. This is espe-
cially expensive. Hence, the dot product computation required is very heavy and costly.
We need another trick!
Luckily, we can use another math trick. It so happens that in solving the quadratic
optimization problem of the linear SVM (i.e., when searching for a linear SVM in the
new higher dimensional space), the training tuples appear only in the form of dot prod-
ucts,
is simply the nonlinear mapping function applied to
transform the training tuples. Instead of computing the dot product on the transformed
data tuples, it turns out that it is mathematically equivalent to instead apply a kernel
function , K
.
X i /.
X j /
, where
.
X
/
.
X i , X j /
, to the original input data. That is,
K
.
X i , X j /D.
X i /.
X j /
.
(9.20)
In other words, everywhere that
.
X i /.
X j /
appears in the training algorithm, we can
replace it with K
. In this way, all calculations are made in the original input space,
which is of potentially much lower dimensionality! We can safely avoid the mapping—it
turns out that we don't even have to know what the mapping is! We will talk more later
about what kinds of functions can be used as kernel functions for this problem.
After applying this trick, we can then proceed to find a maximal separating hyper-
plane. The procedure is similar to that described in Section 9.3.1, although it involves
placing a user-specified upper bound, C , on the Lagrange multipliers,
.
X i , X j /
i . This upper
bound is best determined experimentally.
What are some of the kernel functions that could be used? ” Properties of the kinds of
kernel functions that could be used to replace the dot product scenario just described
3 The dot product of two vectors, X T D.
is x 1 x i 1 C x 2 x i 2
CC x n x in . Note that this involves one multiplication and one addition for each of the n dimensions.
x 1 , x 2 ,
, x n /
:::
and X i D.
x i 1 , x i 2 ,
:::
, x in
/
 
Search WWH ::




Custom Search