Information Technology Reference
In-Depth Information
Chapter 10
Irrelevant Feature and Rule Removal
for Structural Associative Classification
Using Structure-Preserving Flat
Representation
Izwan Nizal Mohd Shaharanee and Fedja Hadzic
Abstract Practical applications of association rule mining often suffer from
overwhelming number of rules that are generated, many of which are not interesting
or useful for the application in question. Removing irrelevant features and/or rules
comprised of irrelevant features can significantly improve the overall performance.
Many statistical and constraint based measures are used to discard unnecessary and
irrelevant features and rules when vectorial or tabular data is in question. In contrast,
the use of such measures is limited in the tree-structured data domain, due to the
structural aspects that are not easily incorporated. In this chapter, we explore the use
of a feature subset selection measure as well as a number of common statistical inter-
estingness measures via a recently proposed structure-preserving flat representation
for tree-structured data such as XML. A feature subset selection is used prior to asso-
ciation rule generation. Once the initial set of rules is obtained, irrelevant rules are
determined as those that are comprised of attributes not determined to be statistically
significant for the classification task. The experiments are performed using real world
web access trees and property management dataset. The results indicate that where
the dataset has more standard structure a large number of insignificant rules will be
discarded and accuracy will increase. However, where the tree instances can vary
greatly in terms of structure and label distribution among nodes, while many rules
are removed and the accuracy increases, there is a significant reduction in coverage
rate of the rule set.
·
·
Keywords Tree-structured data
Association rule based classification
Feature
·
subset selection
Statistical interestingness
Search WWH ::




Custom Search