Dealing with Missing Values in a Probabilistic Decision Tree during Classification - Mining Complex Data

Information Technology Reference

In-Depth Information

is to be compared with the classification result obtained by the PAT approach

for the same test instance. For this purpose, we propose an algorithm which is

presented below 10 .

4.5.3

Instance Analysis Algorithm

Input: Inst test instance ,

n training instances I;

Output: for Inst:

frequency of nearest instances from the same class and frequency of

nearest instances from the different class;

Function Instance-Analysis(Inst:test instance,

I:array[1..n] of instances): Pc:array[1..2] of real;

Const

near=5;

Var

nbSCL, nbDCL, k, near: integer;

dis: real;

begin

nbSCL=0, nbDCL=0;

For k:=1 to n do

begin

dis= Distance(Inst,I[k])

If dis < near

{the two instances are nearest neighbor}

then

If(both Inst and I[k] are from same Class)

then nbSCL++

else nbDCL++;

end; (*for k*)

Pc1= P(nearest instances from same class) = nbSCL/(nbDCL+nbSCL)

Pc2= P(nearest instances from different class)= nbDCL/(nbDCL+nbSCL)

end;

return(Pc);

In the above algorithm, we present only the treatment of two-class problems.

However, in our experiment, we also deal with the mutli-class problem. The con-

stant near is fixed by the user. We consider that two instances are nearest if

the distance between them is lower than near . For a test instance, this algorithm

tells us statistically about the proportion of its nearest instances from each class.

We then compare this frequency with the classification result obtained by the

PAT approach for the same test instance.

Results: To illustrate our experience using the Instance Analysis Algorithm

proposed above, we present only the result of testing this algorithm on three

examples as presented in Table 4.13 from the vote database. The vote database

10

This algorithm is a K-Nearest Neighbour method from instance-based learning. We

use the distance function in equation 4.5 because it is the most appropriate to our

problem.

Mining Complex Data

Search WWH ::

Custom Search

Home