Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Similarly, in D 3 , the four new measures correctly show that m and c are strongly

negatively associated because

the m to c ratio equals the mc to m ratio, that is,

2 both contradict this in an incorrect way: Their

values for D 2 are between those for D 1 and D 3 .

For data set D 4 , both lift and

100

1100 D 9.1%. However, lift and

2 indicate a highly positive association between

m and c , whereas the others indic at e a “neutral” association because the ratio of mc to

mc equals the ratio of mc to mc , which is 1. This means that if a customer buys

coffee (or milk), the probability that he or she will also purchase milk (or coffee) is

exactly 50%.

2 so poor at distinguishing pattern association relationships in

the previous transactional data sets?” To answer this, we have to consider the null-

transactions . A null-transaction is a tr ans action that does not contain any of the item-

sets being examined. In our example, mc represents the number of null-transactions.

Lift and

“Why are lift and

2 have difficulty distinguishing interes ting pattern association relationships

because they are both strongly influenced by mc . Typically, the number of null-

transactions can outweigh the number of individual purchases because, for example,

many people may buy neither milk nor coffee. On the other hand, the other four

measures are good indicators of in teresting pattern associations because their defi-

nitions remove the influence of mc (i.e., they are not influenced by the number of

null-transactions).

This discussion shows that it is highly desirable to have a measure that has a value

that is independent of the number of null-transactions. A measure is null-invariant if

its value is free from the influence of null-transactions. Null-invariance is an impor-

tant property for measuring association patterns in large transaction databases. Among

the six discussed measures in this subsection, only lift and

are not null-invariant

measures.

“ Among the all confidence, max confidence, Kulczynski, and cosine measures, which

is best at indicating interesting pattern relationships ?”

To answer this question, we introduce the imbalance ratio ( IR ), which assesses the

imbalance of two itemsets, A and B , in rule implications. It is defined as

j sup

/ sup

A , B

(6.13)

sup

/C sup

/ sup

A [ B

where the numerator is the absolute value of the difference between the support of the

itemsets A and B , and the denominator is the number of transactions containing A or

B . If the two directional implications between A and B are the same, then IR

will

be zero. Otherwise, the larger the difference between the two, the larger the imbalance

ratio. This ratio is independent of the number of null-transactions and independent of

the total number of transactions.

Let's continue examining the remaining data sets in Example 6.10.

A , B

Example 6.11 Comparing null-invariant measures in pattern evaluation. Although the four mea-

sures introduced in this section are null-invariant, they may present dramatically

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home