Databases Reference
In-Depth Information
Table 6.6 2 2 Contingency Table Summarizing the
Transactions with Respect to Game and
Video Purchases
game game 6 row
video
4000
3500
7500
video
2000
500
2500
6 col
6000
4000
10,000
Table 6.7 Table 6.6 Contingency Table, Now with
the Expected Values
game game 6 row
video
4000 (4500)
3500 (3000)
7500
video
2000 (1500)
500 (1000)
2500
6 col
6000
4000
10,000
Example 6.9 Correlation analysis using
2 analysis for nom-
inal data, we need the observed value and expected value (displayed in parenthesis) for
each slot of the contingency table, as shown in Table 6.7. From the table, we can compute
the
2 . To compute the correlation using
2
value as follows:
2
2
2
D6 .
observed expected
/
D .
40004500
/
C .
35003000
/
2
expected
4500
3000
2
2
C .
20001500
/
C .
5001000
/
D 555.6.
1500
1000
2 value is greater than 1, and the observed value of the slot ( game , video ) D
4000, which is less than the expected value of 4500, buying game and buying video are
negatively correlated . This is consistent with the conclusion derived from the analysis of
the lift measure in Example 6.8.
Because the
6.3.3 A Comparison of Pattern Evaluation Measures
The above discussion shows that instead of using the simple support-confidence frame-
work to evaluate frequent patterns, other measures, such as lift and
2 , often disclose
more intrinsic pattern relationships. How effective are these measures? Should we also
consider other alternatives?
Researchers have studied many pattern evaluation measures even before the start of
in-depth research on scalable methods for mining frequent patterns. Recently, several
other pattern evaluation measures have attracted interest. In this subsection, we present
Search WWH ::




Custom Search