Database Reference
In-Depth Information
Table 5.2. Data sets for IS-TSS.
Data set
Num. instances
Num. features
Num. classes
Pen-based recognition
10992
16
10
SatImage
6435
36
6
Thyroid
7200
21
3
5.6.1.2 Instance Selection - Training Set Selection
To adequately study the behavior of the IS algorithm on the TSS, we should
consider data sets with a larger number of instances than the data sets in Table 5.1.
Therefore, we have chosen three databases that contain more than 6000
individuals, and up to 11,000, which allow an analysis of the scaling up associated
with the IS algorithms to be made. They are shown in Table 5.2.
Pen-Based Recognition : A digit database was created by collecting 250
samples from 44 writers. A WACOM PL-100V pressure-sensitive tablet with an
integrated LCD display and a cordless stylus were used. The input and display
areas are located in the same place. Attached to the serial port of an Intel 486-
based PC, it allows us to collect handwriting samples. These writers are asked to
write 250 digits in random order inside boxes of 500-by-500 tablet pixel resolution.
The raw data that we capture from the tablet consist of integer values between 0
and 500.
SatImage : The database consists of the multispectral values of pixels in 3x3
neighborhoods in a satellite image, and the classification associated with the
central pixel in each neighborhood. The aim is to predict this classification, given
the multispectral values. In the sample database, the class of a pixel is coded as a
number.
Thyroid : The aim is to determine whether a patient referred to the clinic is
hypothyroid. Therefore three classes are built: normal (not hypothyroid),
hyperfunction, and subnormal functioning.
5.6.2 Partitions
Due to the different strategy followed in IS-PS and IS-TSS, we have taken into
account different models of partitions for each one.
5.6.2.1 Instance Selection - Prototype Selection
The sets considered for IS-PS are partitioned using the ten -fold cross-validation
procedure . Each data set, D , is randomly divided into ten disjoint sets of equal
size, D 1 D 10 . We then conduct ten pairs of training and test sets, ( Ti ti ), i =1, …,
 
Search WWH ::




Custom Search