Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

is the 2-tuple

, which suggests that the items Beer and Diapers

are often bought together. This suggests that it may be useful to stock these items in

shelves which are located close to each other. Furthermore, such information is also

useful to making promotion decisions based on previous customer buying behavior.

Sequential pattern mining [ 12 ] is used in the context of very similar scenarios,

except that a temporal component may exist in the transactions. In some cases, the

temporal aspect of the data may be significant from the perspective of analysis. For

example, a customer is more likely to buy a particular kind of printer ink, only after

she has already bought the relevant printer. Therefore, the temporal aspect of the

buying behavior provides more refined information for targeting purposes, when

information about earlier periods is available.

In general however, frequent pattern mining is more useful as subroutine even in

these applications. For example, in a customer targeting application, a rule-based

classifier can be constructed from the discovered frequent patterns. In some cases,

constraints may be used in order to further refine the discovered patterns [ 107 , 109 ],

whereas in other cases the sequential patterns may be used in order to make recom-

mendations. This distinction is important because the vanilla problem of frequent

pattern mining is almost never used in applications on a stand-alone basis. Some of

these applications are discussed in detail in the following subsections.

{ Beer , Diapers }

3

Frequent Patterns for Clustering

The problem of frequent pattern mining is closely related to other data mining prob-

lems such as clustering. The simplest relationship between clustering and frequent

patterns is discussed in [ 124 ], where large items are used in order to enable the

clustering process. The idea is that clusters of transactions will have a large overlaps

between their frequent items. Much more sophisticated methods for clustering are

possible if correlations among the items are used directly in the clustering process.

In particular, the original definition of subspace clustering [ 14 ] is closely related

to the problem of association rule mining. The CLIQUE algorithm discretizes the

original data into intervals, and uses these intervals as pseudo-items in order to

determine relevant patterns. A density measure is used as a surrogate for the support in

the order to determine the frequent patterns. Specifically, the density measure requires

that each cell should contain a particular minimum number of data points in order to

be considered a relevant candidate. The subsequent k -dimensional grid structures are

then re-constructed together in order to create the broader contours of the subspace

clusters in the data. A related method known as ENCLUS [ 32 ] was proposed, in

which the subspace clusters are quantified with the use of an entropy measure, rather

than a density-based measure. Such an entropy-based measure sometimes has some

advantages because of better normalization. Since then, a significant amount of work

has been done in the area of subspace clustering. These techniques have been used

both in the context of biclustering [ 93 , 126 ] of discrete data, and in the context

of projected clustering [ 137 ]. Such methods have also been used for a variety of

Frequent Pattern Mining

Search WWH ::

Custom Search

Home