Database Reference
In-Depth Information
5
6
5
6
1
6
1
6
=
log
log
E View Water
(
)
= −
0 195676
.
,
0
3
0
3
2
3
2
3
1
3
1
3
= 0 276434
log
log
log
E View Mountain
(
)
.
,
0
3
0
3
2
3
2
3
1
3
1
3
=
log
log
log
E View GreenBelt
(
)
= −
0 276434,
.
0
3
0
3
2
3
2
3
1
3
1
3
= 0 276434
log
log
log
E View Street
(
)
.
.
Next,
E (View) = 6/15 * 0.195676 + 3/15 * 0.276434+3/15 * 0.276434 + 3/15 * 0.276434 = 0.2441308.
Thus, the gain
g(View) = E ( T ) - E (View) = 0.2271622.
Analogously,
g(Schooldistrict) = 0.2927512, g(Livingarea) = 0.1100572, g(SqFt) = 0.251855,
where, we choose the value '987' as the partition value.
Finally, the attribute “Schooldistrict” will be selected as the first level partition attribute of decision
tree T by using the C4.5 algorithm.
However, the main difference between our algorithm and the existing decision tree construction
algorithm is how to compute the gain of a partition. Our approach wants to reduce the category cost of
visiting intermediate nodes (includes the tuples in them if user choose to explore them) and the cost of
visiting tuples in the leaves. Our following analysis will show that information gain ignores the cost
of visiting tupes, and the existing category tree construction algorithm proposed by Chakrabarti et. al.
(Chakrabarti, Chaudhuri & Hwang, 2004) ignores the cost of visiting intermediate nodes generated by
future partitions while the category tree construction algorithm proposed by Chen et. al. (Chen & Li,
2007) ignores the cost of visiting tuples in the intermediate nodes.
Cost Estimation
Cost of Visiting Leave
Let v be the node to be partitioned and N ( v ) be the number of tuples in v . Let v 1 and v 2 be children gen-
erated by a partition. Let P i be the probability that users are interested in cluster C i . The gain equals the
reduction of category cost when v is partitioned into v 1 and v 2 . Thus based on the category cost defined
in Definition 2, the reduction of the cost of visiting tuples due to partition v into v 1 and v 2 equals
Search WWH ::




Custom Search