Databases Reference
In-Depth Information
6.13
Give a short example to show that items in a strong association rule actually may
be
negatively correlated
.
6.14
The following contingency table summarizes supermarket tr
ansactio
n data, where
hot dogs
refers to the transactions containing hot dogs,
hot dogs
refers to the
transactions that do not cont
ain hot dog
s,
hamburgers
refers to the transactions
containing hamburgers, and
hamburgers
refers to the transactions that do not
contain hamburgers.
hot dogs
hot dogs
6
row
hamburgers
2000
500
2500
hamburgers
1000
1500
2500
6
col
3000
2000
5000
(a) Suppose that the association rule “
hot dogs
)
hamburgers
” is mined. Given a
minimum support threshold of 25% and a minimum confidence threshold of
50%, is this association rule strong?
(b) Based on the given data, is the purchase of
hot dogs
independent of the purchase
of
hamburgers
? If not, what kind of
correlation
relationship exists between the
two?
(c) Compare the use of the
all confidence
,
max confidence
,
Kulczynski
, and
cosine
measures with
lift
and
correlation
on the given data.
6.15
(
Implementation project
) The DBLP data set
(www.informatik.uni-trier
.de/
ley/db/)
consists of over one million entries of research papers pub-
lished in computer science conferences and journals. Among these entries, there
are a good number of authors that have coauthor relationships.
(a) Propose a method to efficiently mine a set of coauthor relationships that are
closely correlated (e.g., often coauthoring papers together).
(b) Based on the mining results and the pattern evaluation measures discussed in
this chapter, discuss which measure may convincingly uncover close collabora-
tion patterns better than others.
(c) Based on the study in (a), develop a method that can roughly predict advi-
sor and advisee relationships and the approximate period for such advisory
supervision.
6.6
Bibliographic Notes
Association rule mining was first proposed by Agrawal, Imielinski, and Swami [AIS93].
The Apriori algorithm discussed in Section 6.2.1 for frequent itemset mining was pre-
sented in Agrawal and Srikant [AS94b]. A variation of the algorithm using a similar
pruning heuristic was developed independently by Mannila, Tiovonen, and Verkamo