An Overview of Data Mining Techniques - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

• The support: This assesses the rule's coverage or ''how many records the rule

constitutes.'' It denotes the percentage of records that match the antecedents.

• The confidence: This assesses the strength and the predictive ability of the

rule. It indicates ''how likely the consequent is, given the antecedents.'' It denotes

the consequent percentage or probability, within the records that match the

antecedents.

• The lift: This assesses the improvement in the predictive ability when using

the derived rule compared to randomness. It is defined as the ratio of the rule

confidence to the prior confidence of the consequent. The prior confidence is

the overall percentage of the consequent within all the analyzed records.

In the presented example, Rule 2 associates product 1 to product 4 with a

confidence of 71.4%. In plain English, it states that 71.4% of the baskets containing

product 1, which is the antecedent, also contain product 4, the consequent.

Additionally, the baskets containing product 1 comprise 77.8% of all the baskets

analyzed. This measure is the support of the rule. Since six out of the nine total

baskets contain product 4, the prior confidence of a basket containing product 4 is

6/9 or 67%, slightly lower than the rule confidence. Specifically, Rule 2 outperforms

randomness and achieves a confidence about 7% higher with a lift of 1.07. Thus

by using the rule, the chances of correctly identifying a product 1 purchase are

improved by 7%.

Rule 4 is more complicated since it contains two antecedents. It has a lower

coverage (44.4%) but yields a higher confidence (75%) and lift (1.13). In plain

English this rule states that baskets with products 1 and 3 present a strong chance

(75%) of also containing product 4. Thus, there is a business opportunity to

promote product 4 to all customers who check out with products 1 and 3 and have

not bought product 4.

The rule development procedure can be controlled according to model

parameters that analysts can specify. Specifically, analysts can define in advance

the required threshold values for rule complexity, support, confidence, and lift in

order to guide the rule growth process according to their specific requirements.

Unlike decision trees, association models generate rules that overlap. There-

fore, multiple rules may apply for each customer. Rules applicable to each customer

are then sorted according to a selected performance measure, for instance lift or

confidence, and a specified number of n rules, for instance the top three rules,

are retained. The retained rules indicate the top n product suggestions, currently

not in the basket, that best match each customer's profile. In this way, association

models can help in cross-selling activities as they can provide specialized product

recommendations for each customer. As in every data mining task, derived rules

should also be evaluated with respect to their business meaning and ''actionability''

before deployment.

Search WWH ::

Custom Search

Home