Case-Based Reasoning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

It is a fact that under general statistical assumptions, the nearest neighbor

decision policy has a misclassification rate at most twice the optimal Bayes in

the worst cases. This result is relatively weak in that it is gotten in the cases

that the number of samples is unbounded.

5.10.3 Reducing storage requirements

In IB1 algorithm, only the instances that lie between the -neighborhood and

the -core of C are used in the production of an accurate approximation of the

target concept. The other instances have no contribution to the determination

of the concept boundary. So, only keeping those useful instances will save a

large number of memory spaces. However, without the whole knowledge

about the concept boundary, this set is not known. But it can be approximated

by the set of misclassified instances. This is the key idea of the algorithm IB2.

Algorithm 5.2: IB2 algorithm.

1. CD ∅ //CD=Concept description

2. For each x ∈ Training Set do

3. For each y ∈CD do

4. sim[ y ]←similarity ( x,y )

5. y max ←some y ∈CD with maximal sim[ y ].

6. if class(x)=class (y)

7. then classification ←correct

8. else

9. classification←incorrect

10. CD←CD ∪ {x}.

Different from IB1, the IB2 algorithm only saves misclassified instances,

most of which lies between the -neighborhood and th -core of C(for some

rationally small positive number ), and is close to the boundary. In the cases

where the instances vary greatly in their distance from the concept boundary,

the reduction of storage requirements is significantly small in IB2.

The classification accuracy decreases more dramatically than that of IB1

does as the level of noise increases in that those noisy instances are almost

always misclassified. Since IB2 saves only a small part of the non-noisy

training instances, its saved noisy instances have more chances to be used to

generate poor classifications.

Search WWH ::

Custom Search

Home