Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

optimization was at the cost of a large reduction in coverage rate. The results on

this data were compared with a structural classifier based on traditional subtrees,

and some important differences and implications were highlighted. The results show

that associations based on disconnected subtrees can be useful, while the positional

constraint can often result in more precise rules for structurally varied data, but at

the cost of lower coverage rate. From these findings one can conclude that when

forming association rules for tree-structured data, one should not be constrained to a

valid and connected subtree because an interesting association can be anywhere in a

tree instance, and it does not need to be a connected subtree of that instance. These

findings indicate that including disconnected subtrees and constraining the subtrees

by their exact occurrence in the database in addition to traditional subtree patterns,

could improve the classifiers for tree-structured data. The method used in this chapter

is to be seen as complementary and in no way a replacement of the traditional way

that subtrees are mined.

Our future work, will investigate the application domains where including such

association rules can be beneficial and the right way to combine themwith traditional

subtree patterns for optimal performance.

Furthermore, the chi-square and the logistic regression measures were used as a

case in point for statistic-based rule filtering, while Symmetrical Tau was utilized

in the feature subset selection process. However, by no means is any claim being

made that these are the most optimal measures to be used for their specific purpose.

In fact, we have used the confidence constraint here because of the stronger focus

on statistical quality assessment and the difference between the rule sets discovered

using the traditional support and confidence framework. However, many other mea-

sures could be used and applied instead of the support and or confidence constraint,

which, as discussed in several works [ 12 , 23 , 29 ], will yield more interesting rules.

Therefore, another future work will evaluate the combinations of other constraints,

statistical measures and techniques for rule removal/attribute relevance determina-

tion, in context of the tree-structured data domain.

References

1. Agrawal, R., Imieliski, T., Swami, A.: Mining association rules between sets of items in large

databases. ACM SIGMOD Rec. 22 (2), 207-216 (1993)

2. Aumann, Y., Lindell, Y.: A statistical theory for quantitative association rules. Intell. Inf. Syst.

20 (3), 253-283 (2003)

3. Bathoorn, R., Koopman, A., Siebes, A.: Reducing the frequent pattern set. In: Proceedings of

the 6th IEEE International Conference on Data Mining—Workshops, pp. 55-59 (2006)

4. Bayardo, R., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense data-

bases. Data Min. Knowl. Discov. 4 (2-3), 217-240 (2000)

5. Blanchard, J., Guillet, F., Gras, R., Briand, H.: Using information-theoretic measures to assess

association rule interestingness. In: Proceedings of the 5th IEEE International Conference on

Data Mining, pp. 215-238 (2005)

6. Bolon-Canedo, V., Sanchez-Marono, N., Alonso-Betanzos, A.: A review of feature selection

methods on synthetic data. Knowl. Inf. Syst. 34 (3), 483-519 (2013)

7. Brijs, T., Vanhoof, K., Wets, G.: Defining interestingness for association rules. Int. J. Inf. Theor.

Appl. 10 (4), 370-376 (2003)

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home