Database Reference
In-Depth Information
is the 2-tuple
, which suggests that the items Beer and Diapers
are often bought together. This suggests that it may be useful to stock these items in
shelves which are located close to each other. Furthermore, such information is also
useful to making promotion decisions based on previous customer buying behavior.
Sequential pattern mining [ 12 ] is used in the context of very similar scenarios,
except that a temporal component may exist in the transactions. In some cases, the
temporal aspect of the data may be significant from the perspective of analysis. For
example, a customer is more likely to buy a particular kind of printer ink, only after
she has already bought the relevant printer. Therefore, the temporal aspect of the
buying behavior provides more refined information for targeting purposes, when
information about earlier periods is available.
In general however, frequent pattern mining is more useful as subroutine even in
these applications. For example, in a customer targeting application, a rule-based
classifier can be constructed from the discovered frequent patterns. In some cases,
constraints may be used in order to further refine the discovered patterns [ 107 , 109 ],
whereas in other cases the sequential patterns may be used in order to make recom-
mendations. This distinction is important because the vanilla problem of frequent
pattern mining is almost never used in applications on a stand-alone basis. Some of
these applications are discussed in detail in the following subsections.
{ Beer , Diapers }
3
Frequent Patterns for Clustering
The problem of frequent pattern mining is closely related to other data mining prob-
lems such as clustering. The simplest relationship between clustering and frequent
patterns is discussed in [ 124 ], where large items are used in order to enable the
clustering process. The idea is that clusters of transactions will have a large overlaps
between their frequent items. Much more sophisticated methods for clustering are
possible if correlations among the items are used directly in the clustering process.
In particular, the original definition of subspace clustering [ 14 ] is closely related
to the problem of association rule mining. The CLIQUE algorithm discretizes the
original data into intervals, and uses these intervals as pseudo-items in order to
determine relevant patterns. A density measure is used as a surrogate for the support in
the order to determine the frequent patterns. Specifically, the density measure requires
that each cell should contain a particular minimum number of data points in order to
be considered a relevant candidate. The subsequent k -dimensional grid structures are
then re-constructed together in order to create the broader contours of the subspace
clusters in the data. A related method known as ENCLUS [ 32 ] was proposed, in
which the subspace clusters are quantified with the use of an entropy measure, rather
than a density-based measure. Such an entropy-based measure sometimes has some
advantages because of better normalization. Since then, a significant amount of work
has been done in the area of subspace clustering. These techniques have been used
both in the context of biclustering [ 93 , 126 ] of discrete data, and in the context
of projected clustering [ 137 ]. Such methods have also been used for a variety of
Search WWH ::




Custom Search