Java Reference
In-Depth Information
indicate a sharp drop-off in the importance value. Some algorithms
may produce negative attribute importance values. Such attributes
are likely noise, actually making the model accuracy or quality worse
than if they were not present.
4.5
Association
Association analysis is widely used in transaction data analysis for
directed marketing, catalog design, store layout, and other business
decision-making processes. Association is the mining function used
for market basket analysis, that is, the analysis of consumer behavior
for the discovery of relationships or correlations among a set of
items. For example, the presence of one set of items implies the
presence of another item or set of items, such as 90 percent of the
people who buy milk and eggs also buy bread in the same transac-
tion. Association identifies the attribute value conditions (items) that
frequently occur together in a given dataset by providing rules .
The rules returned from an association model are different from
the rules produced from clustering models or classification decision
tree models. For example, decision tree rules are predicate-based ,
meaning that they consist of a series of true or false boolean-valued
expressions, such as “age < 45 AND income > 80,000 AND
owns_home = TRUE.” Association rules deal with discrete items,
specifically, consisting of two sets of items. One itemset, called the
antecedent , implies another itemset, called the consequent . If we have
an antecedent A and a consequent B, the rule can be written as A
B.
These two itemsets are found to occur together in some number of
transactions or market baskets in the provided data.
Support and confidence metrics are used as quality measures of
the rules within an association model. The support of a rule indicates
how frequently the items associated in the rule occur together, for
example, milk, eggs, and bread occur together in 22 percent of the
transactions. The confidence of a rule indicates the probability of find-
ing both the antecedent itemset and consequent itemset in the same
transaction, given that the antecedent alone is found. An example is
illustrated in Figure 4-4 where there are four transactions, each with
some purchased items. The association algorithm found the rule
“milk implies bread,” where milk is the antecedent, and bread is the
consequent. First, we count the number of transactions that contain
milk and bread. Since there are two, we say the support for this rule
is 50 percent (2/4). The confidence of this rule is determined by
Search WWH ::




Custom Search