Database Reference
In-Depth Information
Partitioning for Categorical Attribute
For a categorical attribute, a new subcategory will be created with one branch for each value of the at-
tribute, and the information gain (we will discuss how to compute the information gain in Section 5.2)
will be computed over that subcategory. If a categorical attribute have too many values and thus generate
too many branches, we can add intermediate levels to that attribute. The categorical attribute can only
generate one possible partition, and it will be removed from A R if it is selected as the partition attribute.
Partitioning for Numeric Attribute
For a numeric attribute A i , we use binary partition, i.e., A i v or A i > v . For a numerical value attribute, the
algorithm will generate one subcategory for every possible partition point, and compute the information
gain for that partition point. The best partition point will be selected and the gain of the best partition
Algorithm 6. The category tree building algorithm
Function : BuildTree ( A R , D Q , C , λ)
Input: A R is the set of attributes retained after eliminating, D Q is the query
results, C is the class labels assigned in the clustering step for each tuple
in D Q , λ and is a user defined stopping threshold.
Output: a category tree T .
1. Create a root r .
2. If all tuples in D Q have the same class label stop.
3. For each attribute A i A R .
4. If A i is a categorical attribute then
5. For each value v of A i , create a branch under the current root. Add
those tuples with “ A i = v ” to that branch.
6. Compute the attribute A i 's gain g( A i , T i ) where T i is the subcategory
created.
7. Else
8. For each value v of A i , create a tree T i v with r as the root and two
branches, one for those tuples with A i v or A i , one for those tuples with A i >
v or A i .
9. Compute the gain g( A i , T i v ) for each partition point v and choose the
maximal one as g( A i , T i ).
10. End If
11. End For
12. Choose the attribute A j with the maximal g( A j , T j ), remove A j from A H if A j
is categorical.
13. If g( A j , T j )> λ then
14. Replace r with the subcategory T j , for each leaf n k in T j with tuples in
D Qk ,
BuildTree ( A H , D Qk , C , λ).
15. End If
Search WWH ::




Custom Search