Information Technology Reference
In-Depth Information
Table 10.15 Subtree association rule evaluation for Academic Institution data (10% support 50%
confidence)
Type of analysis
Data partition
FullTree
Embedded
Induced
AR% CR%
AR% CR%
AR% CR%
Initial Rules
Training
64.27
100.00
64.54
100.00
64.54
100.00
(232)
(232)
(232)
(232)
(232)
(232)
Testing
70.06
100.00
70.55
100.00
64.54
100.00
(232)
(232)
(232)
(232)
(232)
(232)
Rules after ST
Training
75.19
73.95
74.94
73.95
74.94
73.95
(43)
(43)
(43)
(43)
(43))
(43)
Testing
74.94
74.09
74.84
74.09
74.84
74.09
(43)
(43)
(43)
(43)
(43))
(43)
Chi-Square
Training
78.21
64.47
77.56
64.47
77.56
64.47
(11)
(11)
(10)
(10)
(10)
(10)
Testing
74.96
60.12
74.58
61.02
74.58
61.02
(11)
(11)
(10)
(10)
(10)
(10)
with a proper sequence of usage of parameters including the ST feature selection,
statistical analysis and the redundancy assessment method. According to Table 10.15 ,
with the reduction of number of rules for FullTree , Embedded and Induced rule sets
for Academic Institution Weblogs (10% Support) the AR are increased but at the
cost of a decrease in CR. One can also notice that the AR for the FullTree rule set
is initially slightly lower than the AR of the Embedded and Induced rule set, but
after Symmetrical Tau is applied, the accuracy of FullTree is higher and remains
higher after chi-square rule filtering. Note that for this data there were no further
rules removed via logistic regression and redundancy check, and hence these stages
are not shown in Table 10.15 .
10.6 Conclusion and Future Work
The work presented in this chapter has explored the application of a number
of statistical methods to optimize the subtree based associative classification for
tree-structured data. It has utilized a structure-preserving flat format representation,
to progressively apply a number of statistical methods to first filter out irrelevant
attributes followed by the removal of irrelevant and redundant rules. The use of this
method has implications that the subtree based association rules are restricted to
those that occur at the same position in the original tree database, and that the initial
rule (before subtree reconstruction), can contain rules based on disconnected sub-
trees. Experiments were performed on three real datasets, and using the proposed
approach a large number of rules were removed in both cases without negatively
affecting the accuracy of the rule set, while for more structurally varied data, this
 
 
Search WWH ::




Custom Search