Image Processing Reference

In-Depth Information

4.2 Kernel Feature Space versus PCA Feature Space

In PCA, covariance matrix of the input data is given after some processing stapes such as cent-

ring the data and calculating the mean. Then, the feature space is given by extracting the ei-

genvectors from this matrix. Indeed, each eigenvector has its corresponding eigenvalue whose

greatness determines the value of its principal axis. Projection of the data onto a spanned sub-

space of the mention feature space is the actual data transformation and depending on how

many principal axes to project the data onto, the dimension of the transferred data is determin-

ed. It means that the dimension of the input data can be reduced to even “one” when using just

the first principal ax for projection. On the other hand, the maximum dimension data can have

after transformation is equal to its original dimension when using all principal axes to project

the data onto. Note that, in PCA the dimension of the feature space, where the spanned sub-

space is selected, is equal to the dimension of the input data. For example, let
X
= [
x
1
, …,
x
N
],

where
x
t
∈
R
d
meaning that we have
N
input data having the dimension of
d
. In this case, no

matter how large
N
is, the feature space has the fixed dimension of
d
, which can be considered

an advantage for PCA as the dimension of feature space is limited to that of input data and it

does not get too high in terms of analyzing potentially large number of data.

In KPCA, however, it is different. The kernel feature space has a higher dimension than that

of input data. Indeed, in KPCA, the whole input data space is mapped into kernel feature

space by a specific kernel function which is nonlinear, and uses inner products. It is given by

considering kernel matrix as input data space for PCA. It means that, unlike PCA, the dimen-

sion of kernel feature space is not dependent to the dimension of the input data. Indeed, it is

equal to the number of input data. Let
X
be [
x
1
, …,
x
N
], for instance, and
x
t
∈
R
d
meaning that

we have
N
input data with the dimension of
d
. In KPCA, no mater how high the dimension of

the input data
d
is, the kernel feature space is
N
dimensional as the kernel matrix (Gram mat-

rix) is [
N
×
N
]. Kernel matrix and its feature space is defined in Shawe-Taylor and Cristianini

Given a set of vectors
X
= [
x
1
, …,
x
N
], the
gram matrix
is defined as the [
N
×
N
] matrix
G

whose entries are
G
ij
=〈
x
i
,
x
j
〉. If we are using a kernel function
k
to evaluate the inner products

in a feature space with feature map
φ
, the associated Gram matrix has entries

(12)

In this case, the matrix is often referred to as kernel matrix displaying as follows:

k

1

2

N

1

k
(
x
1
,
x
1
)
k
(
x
1
,
x
2
)

k
(
x
1
,
x
N
)

2

k
(
x
2
,
x
1
)
k
(
x
2
,
x
2
)

k
(
x
2
,
x
N
)

N

k
(
x
N
,
x
1
)
k
(
x
N
,
x
2
)

k
(
x
N
,
x
N
)

Search WWH ::

Custom Search