Databases Reference
In-Depth Information
The Stopping Condition
Regarding the stopping condition and the choice of the best attribute for the
split, we must measure the diversity of decision values for objects appearing in
a node. This can be achieved through global or local approaches. The global
approach depends on how similar the decision value of an object is to the class
label of the node. The class label of the node can be seen as the average of
decision values for objects appearing in that node. Therefore, we must measure
the similarity between two fuzzy subsets of V m . There are many proposals for
measuring the similarity of two fuzzy sets, so we do not specify what the
similarity function is. We simply assume that sim is a similarity function
mapping two membership functions, µ 1 and µ 2 ,intoanumber sim ( µ 1 2 )
[0 , 1]. Several proposals of similarity functions are reviewed in the Appendix.
Let x,y
U be objects and s beanodeinthetree T . We also write sim ( x,s )
for sim ( µ f m ( x ) s )and sim ( x,y )for sim ( µ f m ( x ) f m ( y ) ).
The diversity of decision values for objects appearing in a node, s , can be
measured by aggregating sim ( x,s ) for all x left at s . The aggregated result is
called the global degree of concentration, and we denote the global degree of
concentration of a node s by gdc s . There are at least two ways to define gdc s .
First, by qualitative means:
gdc s =min
x∈U ( µ U s ( x )
sim ( x,s )) ,
(4)
and, second, by quantitative means,
gdc s =
x∈U
µ U s ( x )
SC ·
sim ( x,s ) ,
(5)
where SC is the sigma count of U s as defined above. A smaller gdc s value
indicates a more diverse decision value of objects appearing in s . The quali-
tative gdc s measures the degree of truth of the statement “the decision class
of every object left at s is similar to the label of s ” in a fuzzy logic sense.
The quantitative gdc s measures the average similarity of the decision class of
every object left at s to the label of s . Note that to calculate the global degree
of concentration, we have to assign labels not only to leaf nodes, but also to
internal nodes. The label assignment procedure for internal nodes is exactly
the same as that introduced in Sect. 4.1.
As suggested in [2], we can also measure the diversity of decision values
of objects appearing in a node, s , by using the (average) mutual similarity
between the decision values. This is called the local degree of concentration ,
denoted by ldc s , and can also be defined in two ways, i.e., by qualitative
means:
ldc s =min
1 ≤i<j≤n ( µ U s ( x i )
µ U s ( x j )
sim ( x i ,x j ));
(6)
Search WWH ::




Custom Search