K Nearest Neighbor Edition to Guide Classification Tree Learning: Motivation and Experimental Results - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Initial distribution: (6,6)

New distribution: (8,8)

Split based on X

(1,4)

(4,1)

(1,1)

(3,4)

(4,3)

(1,1)

Split based on Y

(2,4)

(4,2)

(2,6)

(6,2)

a) Before edition of the training set

b)After edition of the training set

Fig. 4. Effects on split variable

−

+

−

+

−

+

−

++

+

−

+

−

Instance misclassified according to KNN

New decision border

Duplicated instance

(b) After training set editing

(a) Before training set editing

Fig. 5. Effects on pruning

belonging to A,#instances belonging to B). In the left side it is shown the original

training set, along with the partitions induced by the variables X and Y. The in-

formation gain if X is chosen is (1

−

0

.

7683) = 0

.

2317, and if Y is chosen instead is

(1

0817. So, X would be chosen as variable to split. After the training

set edition, as showed in the right side of the figure, four instances are duplicated,

two of them belonging to class A, and the remaining two to class B. Now, the in-

formation gain if X is chosen is (1

−

0

.

9183) = 0

.

−

.

0

9871) = 0

0129, and if Y is chosen instead is

−

.

(1

0

8113) = 0

1887. Variable Y would be chosen, leading to a different tree.

4.2

Change in the Pruning Decision

In figure 5 is shown an example where a change in the pruning decision could

be taken into account. In the left subfigure, before the edition of the training set

with duplication of cases misclasified by k -NN, the density of examples belonging

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home