Database Reference
In-Depth Information
5.2.2.1 Instance Selection for Prototype Selection (IS-PS)
This strategy consists of prototype selection having the classification as objective.
Prototype selection (PS) : The 1-NN classifiers predict the class of a previously
unseen instance by computing its similarity to a set of stored instances called
prototypes . PS - storing a well-selected, proper subset of available training
instances - has been shown to increase classifier accuracy in many domains [10].
At the same time, using prototypes dramatically decreases storage and
classification-time costs.
A PS algorithm is an IS algorithm that attempts to obtain a subset of the
training set that allows the 1-NN classifier to achieve the maximum classification
rate. Figure 5.1 shows the process where a PS algorithm acts.
A large number of approaches for PS algorithms have tried to identify these
salient instances that are stored by a 1-NN classifier; some are surveyed in Section
5.3.
5.2.2.2 Instance Selection for Training Set Selection (IS-TSS)
In this section, we describe the IS as data reduction having to obtain a training set
selection as objective.
Training set selection (TSS) . There may be situations where there are too many
data points; almost always these data are not equally useful in the training phase of
a learning algorithm [32]. It is intuitively clear that those data points that fall near
the decision boundary between two classes are likely to be more influential on the
DM algorithm than points that are well inside. Similarly, if several points from the
same class are very close to each other, the information they convey is virtually
the same, so are they all necessary? IS mechanisms have been proposed for
choosing the most suitable points in the data set that should become instances for
the training data set used by a learning algorithm. For example, in [32], a genetic
algorithm is used for training data selection in radial basis function networks. In
[30], there is a comprehensive study on the general question of training data
selection in the context of function approximation (i.e., regression problems).
Training data set
Prototype
selection
algorithm
Instances
selected
1-nearest
neighbor
classi fier
Fig. 5.1. IS-PS strategy.
 
Search WWH ::




Custom Search