Graphics Reference
In-Depth Information
Chapter 8
Instance Selection
Abstract In this chapter, we consider instance selection as an important focusing
task in the data reduction phase of knowledge discovery and data mining. First of all,
we define a broader perspective on concepts and topics related with instance selection
(Sect. 8.1 ). Due to the fact that instance selection has been distinguished over the years
as two type of tasks, depending on the data mining method applied later, we clearly
separate it into two processes: training set selection and prototype selection. Theses
trends are explained in Sect. 8.2 . Thereafter, and focusing on prototype selection, we
present a unifying framework that covers existing properties obtaining as a result
a complete taxonomy (Sect. 8.3 ). The description of the operation as the most well
known and some recent instance and/or prototype selection methods are provided in
Sect. 8.4 . Advanced and recent approaches that incorporate novel solutions based of
hybridizations with other types of data reduction techniques or similar solutions are
collected in Sect. 8.5 . Finally, we summarize example evaluation results for prototype
selection in an exhaustive experimental comparative analysis in Sect. 8.6 .
8.1 Introduction
Instance selection (IS) plays a pivotal role in the data reduction task due to the
fact that it performs the complementary process regarding the FS. Although it is
independent of FS, in most of the cases, both processes are jointly applied. Facing
the enormous amounts of data may be achieved by scaling down the data as an
alternative to improve the scaling-up of the DM algorithms. We have previously
seen that FS already accomplishes this objective, through the removal of irrelevant
and unnecessary features. In an orthogonal way, the removal of instances can be
considered the same or even more interesting from the point of view of scaling down
the data in certain applications [ 108 ].
The major issue of scaling down the data is the selection or identification of
relevant data from an immense pool of instances, and next to prepare it as input for
a DM algorithm. Selection is synonymous of pressure in many scenarios, such as in
 
Search WWH ::




Custom Search