Graphics Reference
In-Depth Information
8.3.1.2 Type of Selection
This factor is mainly conditioned by the type of search carried out by the PS algo-
rithms, whether they seek to retain border points, central points or some other set of
points.
Condensation: This set includes the techniques which aim to retain the points
which are closer to the decision boundaries, also called border points. The intuition
behind retaining border points is that internal points do not affect the decision
boundaries as much as border points, and thus can be removed with relatively
little effect on classification. The idea is to preserve the accuracy over the training
set, but the generalization accuracy over the test set can be negatively affected.
Nevertheless, the reduction capability of condensation methods is normally high
due to the fact that there are fewer border points than internal points in most of the
data.
Edition: These kinds of algorithms instead seek to remove border points. They
remove points that are noisy or do not agree with their neighbors. This removes
boundary points, leaving smoother decision boundaries behind. However, such
algorithms do not remove internal points that do not necessarily contribute to
the decision boundaries. The effect obtained is related to the improvement of
generalization accuracy in test data, although the reduction rate obtained is low.
Hybrid: Hybrid methods try to find the smallest subset S which maintains or even
increases the generalization accuracy in test data. To achieve this, it allows the
removal of internal and border points based on criteria followed by the two previous
strategies. The KNN classifier is highly adaptable to these methods, obtaining great
improvements even with a very small subset of instances selected.
8.3.1.3 Evaluation of Search
KNN is a simple technique and it can be used to direct the search of a PS algorithm.
The objective pursued is to make a prediction on a non-definitive selection and to
compare between selections. This characteristic influences the quality criterion and
it can be divided into:
Filter: When the kNN rule is used for partial data to determine the criteria of
adding or removing and no leave-one-out validation scheme is used to obtain a
good estimation of generalization accuracy. The fact of using subsets of the training
data in each decision increments the efficiency of these methods, but the accuracy
may not be enhanced.
Wrapper: When the kNN rule is used for the complete training set with the
leave-one-out validation scheme. The conjunction in the use of the two mentioned
factors allows us to get a great estimation of generalization accuracy, which helps to
obtain better accuracy over test data. However, each decision involves a complete
computation of the kNN rule over the training set and the learning phase can be
computationally expensive.
 
Search WWH ::




Custom Search