Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

way. The precise method for prioritizing and weighting the rules may vary quite

significantly in different applications.

Frequent Patterns for Outlier Analysis

Frequent pattern mining techniques are frequently used for outlier analysis in binary

and transaction data. Since transaction data is inherently high-dimensional, it is

natural to utilize subspace methods in order to identify the relevant outliers. The

challenge in subspace methods is that it is no longer computationally practical or

statistically feasible to define subspaces (or sets of items), which are sparse for

outlier detection. For example, in a sparse transaction database containing hundreds

of thousands of items, sparse itemsets are the norm rather than the rule. Therefore,

a subspace exploration for sparse itemsets is likely to report the vast majority of

patterns. The work in [ 62 ] addresses this challenge by working in terms of the

relationship of transactions to dense subspaces, rather than sparse subspaces. In

other words, this is a reverse approach of determining transactions, which are not

included in most of the relevant dense subspace clusters of the data. In the context

of transaction data, subspace clusters are essentially frequent patterns.

The idea in such methods is that frequent patterns are less likely to occur in

outlier transactions, as compared to normal transactions. Therefore, a measure has

been proposed in [ 63 ], which sums up the support of all frequent patterns occurring

in a given transaction in order to provide the outlier score of that transaction. The total

sum is normalized by dividing with the number of frequent patterns. However, this

term can be omitted from the final score, since it is the same across all transactions.

Let

be a transaction database containing the transactions denoted by T 1 ...T N .

Let s ( X ,

) represent the support of itemset X in

. Therefore, if FPS (

, s m )

represents the set of frequent patterns in the database

at minimum the support

level s m , then, the frequent pattern outlier factor FPOF ( T i ) of a transaction T i ∈ D

at minimum support s m is defined as follows:

X ∈ FPS ( D , s m ), X ⊆ T i s ( T i ,

)

FPOF ( T i )

FPS (

, s m )

Intuitively, a transaction containing a large number of frequent patterns with high

support will have high value of FPOF ( T i ). Such a transaction is unlikely to be an

outlier, because it reflects the major patterns in the data.

As in other subspace methods, such an approach can also be used in order to

describe, why a data point may not be considered an outlier. Intuitively, the frequent

patterns with the largest support, which are also not included in the transaction T i are

considered contradictory patterns to T i . Let S be a frequent pattern not contained in

T i . Therefore, S

T i is non-empty, and the contradictiveness of frequent pattern S

to the transaction T i is defined by s ( S ,

−

. Therefore, a transaction which

does not have many items in common with a very frequent itemset is likely to be one

)

∗|

−

T i |

Frequent Pattern Mining

Search WWH ::

Custom Search

Home