Database Reference
In-Depth Information
5.3 Survey of Instance Selection Algorithms
This section surveys several techniques, discusses them in light of the framework
presented in Section 5.2.1, and points out their interesting differences. Most of the
models also tend to use k = 1 (1-NN), where k is the number of neighbors
evaluated, except where noted, although in most cases the algorithms can be
modified to use k > 1. These algorithms use a subset S of the original instances in
the training set T as their representation and primarily use the Euclidean distance
function.
5.3.1 Nearest Neighbor Editing Rules
In this section, we find the classic algorithms based on the nearest neighbor rule.
5.3.1.1 Condensed Nearest Neighbor (CNN) [17]
This algorithm finds a subset S of the training set T such that every number of T is
closer to a member of S of the same class than to a member of S of a different
class. In this way, the subset S can be used to classify all the instances in T
correctly.
This algorithm begins by randomly selecting one instance belonging to each
output class from T and putting them in S . Then each instance in T is classified
using only instances in S . If an instance is misclassified, it is added to S , thus
ensuring that it will be classified correctly. This process is repeated until there are
no instances in T that are misclassified. This algorithm ensures that all instances in
T are classified correctly, but it does not guarantee a minimal set.
5.3.1.2 Edited Nearest Neighbor (ENN) [11]
In this algorithm S starts out the same as T , and then each instance in S is removed
if it does not agree with the majority of its k nearest neighbors (with k = 3,
typically). This edits out noisy instances and close border cases, leaving smoother
decision boundaries. It also retains all internal points, which keeps it from
reducing the storage requirements as much as most other reduction algorithms.
5.3.1.3 Repeated Edited Nearest Neighbor (RENN) [38]
The repeated ENN (RENN) applies the ENN algorithm repeatedly until all
instances remaining have a majority of their neighbors with the same class, which
continues to widen the gap between classes and smooths the decision boundary.
Search WWH ::




Custom Search