Database Reference
In-Depth Information
comes to the store with the intention of buying cookies, we are more confident that they
will also buy milk than if their intentions were reversed. This concept is referred to in
association rule mining as Premise → Conclusion . Premises are sometimes also referred
to as antecedents , while conclusions are sometimes referred to as consequents . For each
pairing, the confidence percentages will differ based on which attribute is the premise and
which the conclusion. When associations between three or more attributes are found, for
example, cookies, crackers → milk, the confidence percentages are calculated based on the
two attributes being found with the third. This can become complicated to do manually,
so it is nice to have RapidMiner to find these combinations and run the calculations for us!
The support percent is an easier measure to calculate. This is simply the number of times
that the rule did occur, divided by the number of observations in the data set. The number
of items in the data set is the absolute number of times the association could have occurred,
since every customer could have purchased cookies and milk together in their shopping
basket. The fact is, they didn't, and such a phenomenon would be highly unlikely in any
analysis. Possible, but unlikely. We know that in our hypothetical example, cookies and
milk were found together in three out of ten shopping baskets, so our support percentage
for this association is 30% (3/10 = .3, or 30%). There is no reciprocal for support
percentages since this metric is simply the number of times the association did occur over
the number of times it could have occurred in the data set.
So now that we understand these two pivotal parameters in association rule mining, let's
make a parameter modification and see if we find any association rules in our data. You
should be in design perspective again, but if not, switch back now. Click on your Create
Association Rules operator and change the min confidence parameter to .5 (see Figure 5-10).
This indicates to RapidMiner that any association with at least 50% confidence should be
displayed as a rule. With this as the confidence percent threshold, if we were using the
hypothetical shopping baskets discussed in the previous paragraphs to explain confidence
and support, cookies → milk would return as a rule because its confidence percent was
75%, while milk → cookies would not, due to that association's 43% confidence percent.
Let's run our model again with the .5 confidence value and see what we get.
Search WWH ::




Custom Search