Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 276

Databases Reference

In-Depth Information

6.13 Give a short example to show that items in a strong association rule actually may

be negatively correlated .

6.14 The following contingency table summarizes supermarket tr ansactio n data, where

hot dogs refers to the transactions containing hot dogs, hot dogs refers to the

transactions that do not cont ain hot dog s, hamburgers refers to the transactions

containing hamburgers, and hamburgers refers to the transactions that do not

contain hamburgers.

hot dogs

hot dogs

6 row

hamburgers

2000

500

2500

hamburgers

1000

1500

2500

6 col

3000

2000

5000

(a) Suppose that the association rule “ hot dogs ) hamburgers ” is mined. Given a

minimum support threshold of 25% and a minimum confidence threshold of

50%, is this association rule strong?

(b) Based on the given data, is the purchase of hot dogs independent of the purchase

of hamburgers ? If not, what kind of correlation relationship exists between the

two?

(c) Compare the use of the all confidence , max confidence , Kulczynski , and cosine

measures with lift and correlation on the given data.

6.15 ( Implementation project ) The DBLP data set (www.informatik.uni-trier

.de/ ley/db/) consists of over one million entries of research papers pub-

lished in computer science conferences and journals. Among these entries, there

are a good number of authors that have coauthor relationships.

(a) Propose a method to efficiently mine a set of coauthor relationships that are

closely correlated (e.g., often coauthoring papers together).

(b) Based on the mining results and the pattern evaluation measures discussed in

this chapter, discuss which measure may convincingly uncover close collabora-

tion patterns better than others.

(c) Based on the study in (a), develop a method that can roughly predict advi-

sor and advisee relationships and the approximate period for such advisory

supervision.

6.6 Bibliographic Notes

Association rule mining was first proposed by Agrawal, Imielinski, and Swami [AIS93].

The Apriori algorithm discussed in Section 6.2.1 for frequent itemset mining was pre-

sented in Agrawal and Srikant [AS94b]. A variation of the algorithm using a similar

pruning heuristic was developed independently by Mannila, Tiovonen, and Verkamo

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home