Database Reference
In-Depth Information
of the explanatory patterns for the T i being an outlier. The patterns with the top- k
values of contradictiveness are reported as the corresponding explanatory patterns.
At an intuitive level, such an approach is analogous to non-membership of data
points in clusters in order to define outliers, rather than directly trying to determine
the deviation or sparsity level of the transactions. As was discussed in the chapter on
clustering-based methods, such an approach may sometimes not be able to distinguish
between noise and anomalies in the data. However, the approach in [ 63 ] indirectly
uses the weight and number of clusters in the outlier score. Furthermore, it uses
multiple patterns in order to provide an ensemble score. This is at least partially able
to alleviate the noise effects. In the context of very sparse transactional data, in which
direct exploration of rare subspaces is infeasible, such an approach would seem to
be a reasonable adaptation of subspace methods.
Frequent pattern mining methods are closely related to information theoretic mea-
sures for anomaly detection. This is because frequent patterns can be viewed as a
code-book in terms of which to represent the data in a compressed form. It has been
shown in [ 115 ], how frequent patterns can be used in order to create a compressed
representation of the data set. Therefore, a natural extension is to use the description
length [ 116 ] of the compressed data in order to compute the outlier scores. This
approach was further improved in [ 17 ]. Pattern mining methods have also been used
recently in the context of temporal data for outlier analysis [ 54 ].
6
Frequent Patterns for Indexing
Indexing algorithms typically require a concise representation of the data for mining
purposes. Typically, clustering methods are used in order to create concise represen-
tations of the data for indexing purposes [ 8 , 103 ]. The idea here is that the database
of transactions are partitioned into groups on the basis of the broad patterns in them.
This grouping is then used in order to perform branch-and-bound search during
similarity-based query processing. Frequent pattern mining methods can also be an
effective method to create such representations, since it is closely connected to the
clustering problem. However, the methods in [ 8 , 103 ] directly use clustering meth-
ods. Nevertheless, the use of frequent pattern mining would be a natural approach in
the context of market basket data.
Such methods have however been used quite successfully in the context of graph
indexing methods. A particular indexing structure which uses discriminative frequent
patterns is the gIndex method [ 132 ]. The key idea here is that structures which are
very similar will contain similar kinds of discriminative patterns. Therefore, such an
approach defines similarity in the context of discriminative patterns in the underlying
data. This work is also able to handle the fact that infrequent patterns may sometimes
be relevant to similarity by using a size increasing support function, in which the
support level depends upon the size of the pattern. Such an approach has low support
for small patterns, but higher support for longer patterns. This broader approach can
Search WWH ::




Custom Search