Information Technology Reference
In-Depth Information
that the small portion of labeled instances can reflect the statistical structure of
the entire data; the global physical meaning can be preserved through sampling
without losing much information.
We first use a toy example to experiment the variation using different amount
of labeled instance to generate the KSIR e.d.r. subspace. In the experiment, we
use the different portions ( t %, t =10, 20, ... 100) of labeled instances to build the
kernelized covariance matrix and find the KSIR projection direction u t % ,then
using this direction u t % to project one of cluster mean onto 2D space. Figure 1
visualizes the data distribution of projecting the different portion of data into 2D
space. The color dots means different amount of labeled join to generate KSIR
directions computing and the big black dot means the entire labeled data to build
the KSIR direction which is ground truth here. The distribution of using small
portion of labeled data of KSIR direction varies big because of random sampling.
Table 1 shows that the numerical result representing the distribution of the
project using different portion of labeled data to calculate the KSIR direction.
The cluster mean in this toy example somehow means that the original data. No
matter from the graphical or from the numerical result, the observation shows
that the mean by different amount of labeled data changes slightly.
Tabl e 1. The Average and Standard Deviation of distance to the actual means after
projection with different amount of labeled data
Label %
Average
Standard Deviation Label %
Average
Standard Deviation
10%
1.0558
0.5190
20%
0.6110
0.4009
30%
0.4133
0.3532
40%
0.3317
0.3421
50%
0.2429
0.2957
60%
0.1474
0.1781
70%
0.1133
0.1522
80%
0.0972
0.1334
90%
0.0495
0.0267
100%
0.0000
0.0000
Therefore, it is assumed that the small amount of labeled instances can
strongly react the distribution of entire labeled data according to above ob-
servations. In this SS problem setting, our methods are based on the constraint
that the more uniformly the labelled data distributed, the better estimation of
the slice mean would be achieved. The kernalized covariance matrix and the
between-class covariance matrix for generating the e.d.r. subspace in KSIR are
estimated by the small portion of labeled data in the semi-supervised dimension
reduction.
Tabl e 2. The summary of the data sets used in experiments
Data set instance dimension classes Data set instance dimension classes
USPS
11,000
256
10
PIE CMU 1,700
1,024
5
COIL20
1,440
1,024
20
Adult
4,521
14
2
Text
1,946
7,511
2
COLT 98
1,051
4,840
2
 
Search WWH ::




Custom Search