Information Technology Reference
In-Depth Information
4.1
Experimental Setup
We used two sales transaction datasets for association rule mining. Each dataset
consists of rows representing transactions and columns representing products.
Purchased products are coded as 'True', and products not purchased are coded as
'False'. In our experiments, we use eight different sets of support and confidence
thresholds for each dataset. Each setting is abbreviated as x A y C, where x is
the percentage of antecedent support threshold, and y is the percentage of confidence
threshold.
The first dataset is named “Online Purchase”. It records 1000 over transactions of
17 anonymous products. We use this dataset to study the properties of rule summaries
when different threshold settings are used.
The second dataset is named “ Sales Transaction ”, it is a large real-world dataset
that records 350 000 over transactions of 800 consumer electronics products in one
year. We use this dataset to demonstrate how actionable insights can be derived from
the rule summaries.
Since CARS is a general method that is not dependent on any specific association
rule mining algorithms, we choose the most commonly used Apriori algorithm [1] for
the experiments. This algorithm is available in the IBM SPSS Modeler® 14.1 data
mining workbench, which is installed in a Pentium PC with Windows 7 Operating
System. The algorithm was applied to the datasets using different support and
confidence threshold settings. For each set of thresholds, the generated rules were
summarized as consequent-based association rule summaries.
4.2
Interpreting and Tuning for Rule Summaries
To illustrate how CARS produces rule summaries, we examine two summaries
derived from a set of five rules generated from the Online Purchase dataset using a
support and confidence thresholds setting of 20A60C.
Table 1 shows that the first three rules having the same consequent (i.e., 'I') are
condensed into Rule Summary 1 in Table 2. Similarly, Rules 4 and 5 having the same
consequent (i.e., 'A') are condensed into Rule Summary 2.
In each rule summary, the count of occurrences for each item is shown by a
number after the asterisk. For example, Rule Summary 1 has consequent item “I*3”
meaning that the consequent item 'I' has occurred three times. Property 1 suggests
that a more important rule summary has a higher consequent frequency. Hence Rule
Summary 1 is more important than Rule Summary 2 because it has a higher
consequent frequency.
In each rule summary, the count of occurrences for each antecedent item is
reflected under the antecedent frequency. For example, in Rule Summary 1, items 'E',
'L', and 'M' all appear only once, so their antecedent frequencies are all one. If their
frequencies were different, then the antecedents could be ranked, with antecedents
having higher antecedent frequencies being considered more important.
With the confidence threshold fixed at 60%, Property 3 suggests that there is no
point in setting an antecedent support threshold higher than 34.21%. This is because
Search WWH ::




Custom Search