Database Reference
In-Depth Information
way. The precise method for prioritizing and weighting the rules may vary quite
significantly in different applications.
5
Frequent Patterns for Outlier Analysis
Frequent pattern mining techniques are frequently used for outlier analysis in binary
and transaction data. Since transaction data is inherently high-dimensional, it is
natural to utilize subspace methods in order to identify the relevant outliers. The
challenge in subspace methods is that it is no longer computationally practical or
statistically feasible to define subspaces (or sets of items), which are sparse for
outlier detection. For example, in a sparse transaction database containing hundreds
of thousands of items, sparse itemsets are the norm rather than the rule. Therefore,
a subspace exploration for sparse itemsets is likely to report the vast majority of
patterns. The work in [ 62 ] addresses this challenge by working in terms of the
relationship of transactions to dense subspaces, rather than sparse subspaces. In
other words, this is a reverse approach of determining transactions, which are not
included in most of the relevant dense subspace clusters of the data. In the context
of transaction data, subspace clusters are essentially frequent patterns.
The idea in such methods is that frequent patterns are less likely to occur in
outlier transactions, as compared to normal transactions. Therefore, a measure has
been proposed in [ 63 ], which sums up the support of all frequent patterns occurring
in a given transaction in order to provide the outlier score of that transaction. The total
sum is normalized by dividing with the number of frequent patterns. However, this
term can be omitted from the final score, since it is the same across all transactions.
Let
D
be a transaction database containing the transactions denoted by T 1 ...T N .
Let s ( X ,
D
) represent the support of itemset X in
D
. Therefore, if FPS (
D
, s m )
represents the set of frequent patterns in the database
at minimum the support
level s m , then, the frequent pattern outlier factor FPOF ( T i ) of a transaction T i D
at minimum support s m is defined as follows:
D
X FPS ( D , s m ), X T i s ( T i ,
D
)
FPOF ( T i )
=
|
FPS (
D
, s m )
|
Intuitively, a transaction containing a large number of frequent patterns with high
support will have high value of FPOF ( T i ). Such a transaction is unlikely to be an
outlier, because it reflects the major patterns in the data.
As in other subspace methods, such an approach can also be used in order to
describe, why a data point may not be considered an outlier. Intuitively, the frequent
patterns with the largest support, which are also not included in the transaction T i are
considered contradictory patterns to T i . Let S be a frequent pattern not contained in
T i . Therefore, S
T i is non-empty, and the contradictiveness of frequent pattern S
to the transaction T i is defined by s ( S ,
. Therefore, a transaction which
does not have many items in common with a very frequent itemset is likely to be one
D
)
∗|
S
T i |
Search WWH ::




Custom Search