Advanced Analytical Theory and Methods: Association Rules - Data Science and Big Data Analytics

Database Reference

In-Depth Information

5.3 Evaluation of Candidate Rules

Frequent itemsets from the previous section can form candidate rules such as X

implies Y (X → Y). This section discusses how measures such as confidence, lift, and

leverage can help evaluate the appropriateness of these candidate rules.

Confidence [2] is defined as the measure of certainty or trustworthiness associated

with each discovered rule. Mathematically, confidence is the percent of transactions

that contain both X and Y out of all the transactions that contain X (see Equation

5.1 ) .

5.1

For example, if {bread,eggs,milk} has a support of 0.15 and {bread,eggs}

also has a support of 0.15, the confidence of rule {bread,eggs}→{milk} is 1,

which means 100% of the time a customer buys bread and eggs, milk is bought as

well. The rule is therefore correct for 100% of the transactions containing bread and

eggs.

A relationship may be thought of as interesting when the algorithm identifies the

relationship with a measure of confidence greater than or equal to a predefined

threshold. This predefined threshold is called the minimum confidence . A higher

confidence indicates that the rule (X → Y) is more interesting or more trustworthy,

based on the sample dataset.

So far, this chapter has talked about two common measures that the Apriori

algorithm uses: support and confidence. All the rules can be ranked based on these

two measures to filter out the uninteresting rules and retain the interesting ones.

Even though confidence can identify the interesting rules from all the candidate

rules, it comes with a problem. Given rules in the form of X → Y, confidence

considers only the antecedent (X) and the co-occurrence of X and Y; it does not take

the consequent of the rule (Y) into concern. Therefore, confidence cannot tell if a

rule contains true implication of the relationship or if the rule is purely coincidental.

X and Y can be statistically independent yet still receive a high confidence score.

Other measures such as lift [6] and leverage [7] are designed to address this issue.

Lift measures how many times more often X and Y occur together than expected if

they are statistically independent of each other. Lift is a measure [6] of how X and Y

are really related rather than coincidentally happening together (see Equation 5.2 ) .

Search WWH ::

Custom Search

Home