Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

Chapter 10

Irrelevant Feature and Rule Removal

for Structural Associative Classification

Using Structure-Preserving Flat

Representation

Izwan Nizal Mohd Shaharanee and Fedja Hadzic

Abstract Practical applications of association rule mining often suffer from

overwhelming number of rules that are generated, many of which are not interesting

or useful for the application in question. Removing irrelevant features and/or rules

comprised of irrelevant features can significantly improve the overall performance.

Many statistical and constraint based measures are used to discard unnecessary and

irrelevant features and rules when vectorial or tabular data is in question. In contrast,

the use of such measures is limited in the tree-structured data domain, due to the

structural aspects that are not easily incorporated. In this chapter, we explore the use

of a feature subset selection measure as well as a number of common statistical inter-

estingness measures via a recently proposed structure-preserving flat representation

for tree-structured data such as XML. A feature subset selection is used prior to asso-

ciation rule generation. Once the initial set of rules is obtained, irrelevant rules are

determined as those that are comprised of attributes not determined to be statistically

significant for the classification task. The experiments are performed using real world

web access trees and property management dataset. The results indicate that where

the dataset has more standard structure a large number of insignificant rules will be

discarded and accuracy will increase. However, where the tree instances can vary

greatly in terms of structure and label distribution among nodes, while many rules

are removed and the accuracy increases, there is a significant reduction in coverage

rate of the rule set.

·

Keywords Tree-structured data

Association rule based classification

Feature

·

subset selection

Statistical interestingness

Search WWH ::

Custom Search

Home