Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

of the explanatory patterns for the T i being an outlier. The patterns with the top- k

values of contradictiveness are reported as the corresponding explanatory patterns.

At an intuitive level, such an approach is analogous to non-membership of data

points in clusters in order to define outliers, rather than directly trying to determine

the deviation or sparsity level of the transactions. As was discussed in the chapter on

clustering-based methods, such an approach may sometimes not be able to distinguish

between noise and anomalies in the data. However, the approach in [ 63 ] indirectly

uses the weight and number of clusters in the outlier score. Furthermore, it uses

multiple patterns in order to provide an ensemble score. This is at least partially able

to alleviate the noise effects. In the context of very sparse transactional data, in which

direct exploration of rare subspaces is infeasible, such an approach would seem to

be a reasonable adaptation of subspace methods.

Frequent pattern mining methods are closely related to information theoretic mea-

sures for anomaly detection. This is because frequent patterns can be viewed as a

code-book in terms of which to represent the data in a compressed form. It has been

shown in [ 115 ], how frequent patterns can be used in order to create a compressed

representation of the data set. Therefore, a natural extension is to use the description

length [ 116 ] of the compressed data in order to compute the outlier scores. This

approach was further improved in [ 17 ]. Pattern mining methods have also been used

recently in the context of temporal data for outlier analysis [ 54 ].

6

Frequent Patterns for Indexing

Indexing algorithms typically require a concise representation of the data for mining

purposes. Typically, clustering methods are used in order to create concise represen-

tations of the data for indexing purposes [ 8 , 103 ]. The idea here is that the database

of transactions are partitioned into groups on the basis of the broad patterns in them.

This grouping is then used in order to perform branch-and-bound search during

similarity-based query processing. Frequent pattern mining methods can also be an

effective method to create such representations, since it is closely connected to the

clustering problem. However, the methods in [ 8 , 103 ] directly use clustering meth-

ods. Nevertheless, the use of frequent pattern mining would be a natural approach in

the context of market basket data.

Such methods have however been used quite successfully in the context of graph

indexing methods. A particular indexing structure which uses discriminative frequent

patterns is the gIndex method [ 132 ]. The key idea here is that structures which are

very similar will contain similar kinds of discriminative patterns. Therefore, such an

approach defines similarity in the context of discriminative patterns in the underlying

data. This work is also able to handle the fact that infrequent patterns may sometimes

be relevant to similarity by using a size increasing support function, in which the

support level depends upon the size of the pattern. Such an approach has low support

for small patterns, but higher support for longer patterns. This broader approach can

Search WWH ::

Custom Search

Home