Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

10.2.1 Relationship Between Feature Subset Selection

and Frequent Subtree Interestingness

In general, the objective of feature subset selection as defined in [ 18 ] is to find a

minimum set of attributes such that the resulting probability distribution of the data

classes is as close as possible to the original distribution obtained using all attributes.

Han and Kamber in [ 18 ] asserted that domain expertise can be employed in order

to pick out useful attributes. However, because the data mining task involves a large

volume of data and unpredictable behaviour of data during data mining, this task is

often expensive and time consuming.

The test of statistical significance is one of the prominent approaches in evalu-

ating attributes/features usefulness. Stepwise forward selection, stepwise backward

selection and a combination of both are three commonly used heuristic techniques

utilized in statistical significance tests such as linear regression and logistic regres-

sion [ 18 ]. Moreover, the application of correlation analysis such as the chi-square

test is also valuable in identifying redundant variables for feature subset selection.

Another powerful technique for this purpose is the Symmetrical Tau [ 54 ], which

is a statistical-heuristic feature selection criterion. It measures the capability of an

attribute in predicting the class of another attribute. Additionally, information gain

is another attributes relevance analysis method employed in the popular ID3 [ 33 ]

and C4.5 [ 34 ] as reported in [ 18 ]. An extensive overview and comparison of the

different approaches to the feature subset selection problem has been provided in

[ 6 , 11 , 21 , 30 ].

While the original purpose of feature subset selection is to reduce the number of

attributes to only those attributes relevant for a certain data mining task, they never-

theless can be utilized to measure the interestingness of rules/pattern generated. For

example, if the rule/pattern consists of irrelevant attributes, the aforementioned mea-

sure can also give some indication that the rule/pattern is not interesting. Moreover,

[ 12 ] stated that there are three roles of interestingness measures. The first is their

ability to discard uninteresting patterns during the mining process, thereby narrowing

the search space and improving the mining efficiency. The second role is to calcu-

late the interestingness scores for each pattern, which allows the ranking of patterns

according to specific needs. The final role is the use of interestingness measures dur-

ing the post-processing stage to select interesting patterns. Interestingness measures

such as the chi-square test [ 8 ], Symmetrical Tau [ 54 ] and Mutual Information [ 44 ],

are capable of measuring the interestingness of rules and at the same time identifying

useful features for frequent patterns.

Since frequent patterns are generated based solely on frequency without

considering their predictive power, the use of frequent patterns without selecting

appropriate features will still result in a huge feature space which leads to larger

volume and complexity of rules. This might not only slow down the model learn-

ing process, but even worse, the classification accuracy deteriorates (another kind of

overfitting issue since the features are numerous) [ 9 ].

Search WWH ::

Custom Search

Home