Semi-supervised Dimension Reduction with Kernel Sliced Inverse Regression - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Fig. 1. Data distribution of projecting the different portion of data into 2D space

Here we reformulate semi-supervised dimension reduction which inhert the su-

pervised KSIR. K denotes the kernelized entire dataset and K denotes kernelized

labelled dataset. The goal becomes to solve the following generality eigenvalue

problem:

ʣ E ( K|Y J ) ʲ = ʻʣ K ʲ.

(7)

And the between-class covariance will be

J

ʣ E ( K|Y J ) = 1

n

K )

n j ( K j

K )( K j

−

(8)

j =1

where K is the grand mean computed through the kernelized entire dataset,

including labelled and unlabelled data, K j is the sample mean for the j th slice,

estimated by the kernelized labeled data of j th slice.

In the observation of solving eigenvalue decomposition problem, the effective

rank of the covariance matrix of kernel data is quite low, so we take only top

J

1 components for KSIR computing, where the J is the number of classes.

This restricts its use in binary classification problems since only one-direction

can be obtained, which will drop much information in the e.d.r. subspace. Basic

clustering approach is applied to force training data separate to more classes for

finding the KSIR directions. With believe of high dimension data can describe

the data more accurate, for the training data with labeled information in binary

classification, we use K -means to partition positive instances into two different

clusters for each class.Then, using those four classes as their new label for KSIR

to compute the J

−

1 direction(we got 3 direction in this example). After getting

the direction, the data is put back to the original one and use the original label

information for SVM to classify. Thus, we can give more descriptive variables

for the classification purpose.

−

Search WWH ::

Custom Search

Home