A Bipolar Interpretation of Fuzzy Decision Trees - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

The Stopping Condition

Regarding the stopping condition and the choice of the best attribute for the

split, we must measure the diversity of decision values for objects appearing in

a node. This can be achieved through global or local approaches. The global

approach depends on how similar the decision value of an object is to the class

label of the node. The class label of the node can be seen as the average of

decision values for objects appearing in that node. Therefore, we must measure

the similarity between two fuzzy subsets of V m . There are many proposals for

measuring the similarity of two fuzzy sets, so we do not specify what the

similarity function is. We simply assume that sim is a similarity function

mapping two membership functions, µ 1 and µ 2 ,intoanumber sim ( µ 1 ,µ 2 ) ∈

[0 , 1]. Several proposals of similarity functions are reviewed in the Appendix.

Let x,y

U be objects and s beanodeinthetree T . We also write sim ( x,s )

for sim ( µ f m ( x ) ,µ s )and sim ( x,y )for sim ( µ f m ( x ) ,µ f m ( y ) ).

The diversity of decision values for objects appearing in a node, s , can be

measured by aggregating sim ( x,s ) for all x left at s . The aggregated result is

called the global degree of concentration, and we denote the global degree of

concentration of a node s by gdc s . There are at least two ways to define gdc s .

First, by qualitative means:

∈

gdc s =min

x∈U ( µ U s ( x )

→ ⊗ sim ( x,s )) ,

(4)

and, second, by quantitative means,

gdc s =

x∈U

µ U s ( x )

SC ·

sim ( x,s ) ,

(5)

where SC is the sigma count of U s as defined above. A smaller gdc s value

indicates a more diverse decision value of objects appearing in s . The quali-

tative gdc s measures the degree of truth of the statement “the decision class

of every object left at s is similar to the label of s ” in a fuzzy logic sense.

The quantitative gdc s measures the average similarity of the decision class of

every object left at s to the label of s . Note that to calculate the global degree

of concentration, we have to assign labels not only to leaf nodes, but also to

internal nodes. The label assignment procedure for internal nodes is exactly

the same as that introduced in Sect. 4.1.

As suggested in [2], we can also measure the diversity of decision values

of objects appearing in a node, s , by using the (average) mutual similarity

between the decision values. This is called the local degree of concentration ,

denoted by ldc s , and can also be defined in two ways, i.e., by qualitative

means:

ldc s =min

1 ≤i<j≤n ( µ U s ( x i )

⊗

µ U s ( x j )

→ ⊗ sim ( x i ,x j ));

(6)

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home