Information Technology Reference
In-Depth Information
Table 4.12. The Root Mean Squared Error
DataBase
PAT
C4.5
OAT
Vote
0.310443
0.52039
0.315079
Nursery
0.4456728
0.44999
0.436149
Lymphography
0.260477
0.477835
0.420603
Mushroom
0412543
0.643147
0.535865
Zoo
0.133817
0.44812
0.245
4.5
Measure of the Quality of the Classification Results
Our approach is based on the dependence between attributes. The results of
classification given by our approach are probabilistic. We measured the quality
of our classification results in order to improve the performance of our approach.
For this purpose, we considered an algorithm [12] called Relief , which has been
shown to be very ecient in estimating attributes. We were interested in Relief
because it relies entirely on statistical analysis and employs few heuristics. On
the other hand, the classical measures for classification 8 evaluate the quality
of an attribute with respect to the class independently of the context of other
attributes [25]. However, Relief takes into account the context of other attributes
when estimating the quality of an attribute with respect to the class. The basic
idea of Relief , when analysing training instances, is to take into account not
only the difference in attribute values and the difference in classes, but also the
distance between instances. In this section, we first present the algorithm Relief ,
its extension ReilefF and the Distance function used to calculate the distance
between two instances. We then propose an algorithm which calculates for each
test instance in the test data the frequency of its nearest instances from each
class. Finally, we give some examples.
4.5.1
Relief
The key idea of Relief is to estimate attributes according to how well their values
distinguish among instances that are close to each other. For that purpose, given
a randomly selected instance R from m instances, Relief [14] searches for its two
nearest neighbors: one H from the same class and the other M from a different
class. It uses a function diff that calculates the difference between the values
of Attribute for two instances. For a discrete attribute this difference is either
1 when the values are different or 0 when the values are equal. Estimating the
quality W[A] of attribute A is defined as shown below:
W [ A ]= W [ A ] − diff ( A, R, H ) /m + diff ( A, R, M ) /m
(4.3)
Relief updates the quality estimation W[A] for all the attributes A depending
on their values for R , M and H .Thisisrepeated m times according to the m
8 As information gain, gain ration, distance measure and Gini-index, etc.
 
Search WWH ::




Custom Search