Information Technology Reference
In-Depth Information
optimization was at the cost of a large reduction in coverage rate. The results on
this data were compared with a structural classifier based on traditional subtrees,
and some important differences and implications were highlighted. The results show
that associations based on disconnected subtrees can be useful, while the positional
constraint can often result in more precise rules for structurally varied data, but at
the cost of lower coverage rate. From these findings one can conclude that when
forming association rules for tree-structured data, one should not be constrained to a
valid and connected subtree because an interesting association can be anywhere in a
tree instance, and it does not need to be a connected subtree of that instance. These
findings indicate that including disconnected subtrees and constraining the subtrees
by their exact occurrence in the database in addition to traditional subtree patterns,
could improve the classifiers for tree-structured data. The method used in this chapter
is to be seen as complementary and in no way a replacement of the traditional way
that subtrees are mined.
Our future work, will investigate the application domains where including such
association rules can be beneficial and the right way to combine themwith traditional
subtree patterns for optimal performance.
Furthermore, the chi-square and the logistic regression measures were used as a
case in point for statistic-based rule filtering, while Symmetrical Tau was utilized
in the feature subset selection process. However, by no means is any claim being
made that these are the most optimal measures to be used for their specific purpose.
In fact, we have used the confidence constraint here because of the stronger focus
on statistical quality assessment and the difference between the rule sets discovered
using the traditional support and confidence framework. However, many other mea-
sures could be used and applied instead of the support and or confidence constraint,
which, as discussed in several works [ 12 , 23 , 29 ], will yield more interesting rules.
Therefore, another future work will evaluate the combinations of other constraints,
statistical measures and techniques for rule removal/attribute relevance determina-
tion, in context of the tree-structured data domain.
References
1. Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large
databases. ACM SIGMOD Rec. 22 (2), 207-216 (1993)
2. Aumann, Y., Lindell, Y.: A statistical theory for quantitative association rules. Intell. Inf. Syst.
20 (3), 253-283 (2003)
3. Bathoorn, R., Koopman, A., Siebes, A.: Reducing the frequent pattern set. In: Proceedings of
the 6th IEEE International Conference on Data Mining—Workshops, pp. 55-59 (2006)
4. Bayardo, R., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense data-
bases. Data Min. Knowl. Discov. 4 (2-3), 217-240 (2000)
5. Blanchard, J., Guillet, F., Gras, R., Briand, H.: Using information-theoretic measures to assess
association rule interestingness. In: Proceedings of the 5th IEEE International Conference on
Data Mining, pp. 215-238 (2005)
6. Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A review of feature selection
methods on synthetic data. Knowl. Inf. Syst. 34 (3), 483-519 (2013)
7. Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Int. J. Inf. Theor.
Appl. 10 (4), 370-376 (2003)
 
Search WWH ::




Custom Search