A Pragmatic Approach to Summarize Association Rules in Business Analytics Projects - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

the number of generated association rules. For example, one important early idea is to

use multilevel organization and summarization of discovered rules to eliminate

redundant rules [5]. Other early attempts include pruning association rules using the

idea of rule cover and grouping similar association rules using clustering [13]. Such

an approach processes the generated rules to create condensed rules that are easier to

understand. Other proposals along this line of research explore the use of ontologies

[7, 8] or domain knowledge [2] to prune or group discovered association rules.

Another direction of research focuses on generating non-redundant association

rules. Some researchers use the concept of frequent closed itemsets to generate non-

redundant rules [15], or use the notion of representative basis to identify minimum

and unique association rules [6]. More recent proposals on generating non-redundant

rules include the idea of using reliable basis to represent association rules [14]. These

methods can effectively produce non-redundant rules that are a lot smaller than the

rules generated from the traditional approach. However, the number of less-redundant

rules generated is still high from a business end-user point of view.

Fig. 1. Process of Association Rule Mining

The second issue of association rule mining is the difficulty in setting thresholds

for interestingness metrics (e.g., support and confidence measures). Usually, improper

threshold settings can result in generating either too few or too many rules, and in

either case this impedes identification of interesting rules. To tackle this issue, some

researchers (e.g., [12]) attempt to derive minimum support based on additional

metrics such as lift or conviction measures, but this requires other user-specified input

values. More recent work attempts to avoid user input by deriving the support

threshold from the data alone [9]. However, different subsets of rules may need

different thresholds and a single threshold value may not be able to extract all

interesting rules.

Note that most of the aforementioned work focus on finding a small set of

representative rules to replace redundant rules. Our work in this paper serves a

different purpose---to provide a highly succinct summary of all the rules generated;

and to provide a summary of the interestingness measures, so as to guide users in

setting more appropriate interestingness thresholds. Hence, our proposed method

focuses on the post-processing of rules generated by any association rule mining

methods .

Search WWH ::

Custom Search

Home