Classification: Advanced Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Example 9.2 Nonlinear transformation of original input data into a higher dimensional space.

Consider the following example. A 3-D input vector X D.

x 1 , x 2 , x 3 /

is mapped into

a 6-D space, Z , using the mappings

1 .

/D x 1 ,

2 .

/D x 2 ,

3 .

/D x 3 ,

4 .

2 ,

x 1 /

5 .

/D x 1 x 2 , and

6 .

/D x 1 x 3 . A decision hyperplane in the new space is

/D WZ C b , where W and Z are vectors. This is linear. We solve for W and

b and then substitute back so that the linear decision hyperplane in the new ( Z )

space corresponds to a nonlinear second-order polynomial in the original 3-D input

space:

2 C w 5 x 1 x 2 C w 6 x 1 x 3 C b

D w 1 z 1 C w 2 z 2 C w 3 z 3 C w 4 z 4 C w 5 z 5 C w 6 z 6 C b .

/D w 1 x 1 C w 2 x 2 C w 3 x 3 C w 4 .

x 1 /

But there are some problems. First, how do we choose the nonlinear mapping to

a higher dimensional space? Second, the computation involved will be costly. Refer to

Eq. (9.19) for the classification of a test tuple, X T . Given the test tuple, we have to com-

pute its dot product with every one of the support vectors. 3 In training, we have to

compute a similar dot product several times in order to find the MMH. This is espe-

cially expensive. Hence, the dot product computation required is very heavy and costly.

We need another trick!

Luckily, we can use another math trick. It so happens that in solving the quadratic

optimization problem of the linear SVM (i.e., when searching for a linear SVM in the

new higher dimensional space), the training tuples appear only in the form of dot prod-

ucts,

is simply the nonlinear mapping function applied to

transform the training tuples. Instead of computing the dot product on the transformed

data tuples, it turns out that it is mathematically equivalent to instead apply a kernel

function , K

X i /.

X j /

, where

X i , X j /

, to the original input data. That is,

X i , X j /D.

X i /.

X j /

(9.20)

In other words, everywhere that

X i /.

X j /

appears in the training algorithm, we can

replace it with K

. In this way, all calculations are made in the original input space,

which is of potentially much lower dimensionality! We can safely avoid the mapping—it

turns out that we don't even have to know what the mapping is! We will talk more later

about what kinds of functions can be used as kernel functions for this problem.

After applying this trick, we can then proceed to find a maximal separating hyper-

plane. The procedure is similar to that described in Section 9.3.1, although it involves

placing a user-specified upper bound, C , on the Lagrange multipliers,

X i , X j /

i . This upper

bound is best determined experimentally.

“ What are some of the kernel functions that could be used? ” Properties of the kinds of

kernel functions that could be used to replace the dot product scenario just described

3 The dot product of two vectors, X T D.

is x 1 x i 1 C x 2 x i 2

CC x n x in . Note that this involves one multiplication and one addition for each of the n dimensions.

x 1 , x 2 ,

, x n /

:::

and X i D.

x i 1 , x i 2 ,

:::

, x in

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home