Information Technology Reference
In-Depth Information
It is a fact that under general statistical assumptions, the nearest neighbor
decision policy has a misclassification rate at most twice the optimal Bayes in
the worst cases. This result is relatively weak in that it is gotten in the cases
that the number of samples is unbounded.
5.10.3 Reducing storage requirements
In IB1 algorithm, only the instances that lie between the -neighborhood and
the -core of C are used in the production of an accurate approximation of the
target concept. The other instances have no contribution to the determination
of the concept boundary. So, only keeping those useful instances will save a
large number of memory spaces. However, without the whole knowledge
about the concept boundary, this set is not known. But it can be approximated
by the set of misclassified instances. This is the key idea of the algorithm IB2.
Algorithm 5.2: IB2 algorithm.
1. CD //CD=Concept description
2. For each x ∈ Training Set do
3. For each y ∈CD do
4. sim[ y ]←similarity ( x,y )
5. y max ←some y ∈CD with maximal sim[ y ].
6. if class(x)=class (y)
7. then classification ←correct
8. else
9. classification←incorrect
10. CD←CD {x}.
Different from IB1, the IB2 algorithm only saves misclassified instances,
most of which lies between the -neighborhood and th -core of C(for some
rationally small positive number ), and is close to the boundary. In the cases
where the instances vary greatly in their distance from the concept boundary,
the reduction of storage requirements is significantly small in IB2.
The classification accuracy decreases more dramatically than that of IB1
does as the level of noise increases in that those noisy instances are almost
always misclassified. Since IB2 saves only a small part of the non-noisy
training instances, its saved noisy instances have more chances to be used to
generate poor classifications.
Search WWH ::




Custom Search