Information Technology Reference
In-Depth Information
Table 10.11 Subtree association rule evaluation for CSLOG data (1% support 50% confidence)
Type of analysis
Data partition FullTree
Embedded
Induced
AR% CR%
AR% CR%
AR% CR%
Initial rules
Training
68.09
98.59
68.12
98.59
68.11
98.59
(13835)
(13835)
(13834)
(13834)
(13810)
(13810)
Testing
69.94
98.6
69.94
98.6
69.94
98.6
Rules after ST
Training
69.94
98.59
70.02
98.59
70.02
98.59
(6084)
(6084)
(6083)
(6083)
(6081)
(6081)
Testing
72.01
98.6
72.1
98.6
72.1
98.6
Chi-Square
Training
79.22
48.97
79.02
48.39
78.41
48.39
(73)
(73)
(72)
(72)
(65)
(65)
Testing
78.78
48.77
78.57
48.25
78.06
48.25
Logistic regression
Training
79.22
48.97
79.02
48.39
78.41
48.39
(73)
(73)
(71)
(71)
(64)
(64)
Testing
78.78
48.77
78.57
48.25
78.06
48.25
Redundancy removal Training
79.02
48.97
78.71
48.97
78.71
48.97
(61)
(61)
(54)
(54)
(54)
(54)
Testing
78.53
48.77
78.53
48.77
78.53
48.77
there is a significant reduction in the number of rules. While an increase in AR can
be observed, this is at the cost of reduced CR capabilities. The characteristics of
the FullTree rule set are similar to those of the Embedded and Induced rule sets,
and the AR and CR are very similar or the same for the different rule sets. This is
because the rules from Embedded and Induced rule sets are subsets of FullTree , and
in this dataset there were not so many variations among the rule sets w.r.t the level of
embedding in subtrees or frequent patterns that produce disconnected subtrees. To
conclude, the increase in prediction/classification accuracy comes with a trade-off
since fewer instances are captured from the datasets. On the positive side, a smaller
number of rules is expected to have better generalization power and are easier for
the user to understand and utilize for decision support purposes.
Comparison with XRules for varying support thresholds . In Table 10.12 we com-
pare the AR and CR of the final rule sets of FullTree with XRules approach for
varying support thresholds. Note that the approaches are fairly different in terms of
the rule filtering performed in the process. Nevertheless, the comparison performed
serves mainly as a benchmark for the kind of accuracy and coverage rate that is to be
obtained when basing the classification on frequent patterns/subtrees extracted using
the support and confidence thresholds. As such, in no way do the results indicate
that one approach performs better than the other, as the internal mechanism is rather
incompatible. The XRules approach is based on the TreeMiner [ 51 ] algorithm for
extracting ordered embedded subtrees, and hence the number of rules extractd at
varying support thresholds is larger (shown in brackets), since the likelihood that a
subtree will be frequent when it does not need to occur at the same position is much
 
Search WWH ::




Custom Search