Database Reference
In-Depth Information
5.6 Validation and Testing
After gathering the output rules, it may become necessary to use one or more
methods to validate the results in the business context for the sample dataset. The
first approach can be established through statistical measures such as confidence,
lift, and leverage. Rules that involve mutually independent items or cover few
transactions are considered uninteresting because they may capture spurious
relationships.
As mentioned in Section 5.3, confidence measures the chance that X and Y appear
together in relation to the chance X appears. Confidence can be used to identify the
interestingness of the rules.
Lift and leverage both compare the support of X and Y against their individual
support. While mining data with association rules, some rules generated could be
purely coincidental. For example, if 95% of customers buy X and 90% of customers
buy Y, then X and Y would occur together at least 85% of the time, even if there is no
relationship between the two. Measures like lift and leverage ensure that interesting
rules are identified rather than coincidental ones.
Another set of criteria can be established through subjective arguments. Even with
a high confidence, a rule may be considered subjectively uninteresting unless it
reveals any unexpected profitable actions. For example, rules like
{paper}→{pencil} may not be subjectively interesting or meaningful despite
high support and confidence values. In contrast, a rule like {diaper}→{beer}
that satisfies both minimum support and minimum confidence can be considered
subjectively interesting because this rule is unexpected and may suggest a cross-sell
opportunity for the retailer. This incorporation of subjective knowledge into the
evaluation of rules can be a difficult task, and it requires collaboration with domain
experts. As seen in Chapter 2, “Data Analytics Lifecycle,” the domain experts may
serve as the business users or the business intelligence analysts as part of the Data
Science team. In Phase 5, the team can communicate the results and decide if it is
appropriate to operationalize them.
Search WWH ::




Custom Search