Supervised Pattern Mining and Applications to Classification - Frequent Pattern Mining

Database Reference

In-Depth Information

1. Mine a set of supervised patterns satisfying certain constraints

2. Select some patterns out of this set following certain criteria

and iterative pattern are derived set mining :

1. Mine a (set of) supervised pattern(s) satisfying certain constraints

2. Modify the constraints or data

3. Return to 1.

The main argument in favor of the post-processing approach is its efficiency. It allows

to run a pattern mining algorithm only once and hence avoids the possibly time

consuming repeated execution of pattern mining algorithms. The main arguments

in favor of iterative mining algorithms are their potentially higher accuracy and

their potential to use parameter-free pattern mining algorithms; in many of these

algorithms, it is not necessary to define a minimum support threshold in advance.

Both separate-and-conquer and divide-and-conquer techniques have been used

within either of these categories.

Most of these techniques can be understood in terms of the partition that a set

of patterns induces on the data. We therefore first need to introduce the concept of

equivalence relations and partitions :

∼

Definition 3.1

An equivalence relation on

is a binary relation

such that for

all d 1 , d 2 , d 3 ∈ D

, the relation is:

1. Reflexive: d 1 ∼ d 1 .

2. Symmetric: d 1 ∼ d 2 ⇒ d 2 ∼ d 1 .

3. Transitive: d 1 ∼ d 2 ∧ d 2 ∼ d 3 ⇒ d 1 ∼ d 3 .

The equivalence relation partitions

into disjunct subsets called equivalence classes

or blocks. The equivalence class of an element d

d ∈ D |

∈ D

is given as [ d ]

d }

Intuitively, transactions are in an equivalence class if they can not be distinguished

from each other. We can use patterns to create a new database, in which each trans-

action is described by a list of patterns present in it. We consider two transactions

equivalent in this new representation if they are described using the same lists of

patterns.

More formally, an individual pattern r induces an equivalence relation

∼

. The set of blocks is called partition or quotient set, and is denoted by

∼

∀

d 1 , d 2 ∈

, d 1

∼ r d 2

⇔

match ( r , d 1 )

match ( r , d 2 ), and so does a set of patterns

∀

match ( r , d 2 )).

In fact, the partitioning of a data set into classes that we defined in Definition 2.3

is induced by an equivalence relation based on the class labels.

In a supervised setting, it are derived is important to distinguish blocks which are

pure and which are not pure. A block is pure if all examples in it have the same class

label. Within a supervised setting it is important that the partition induced by a set

of patterns contains mostly pure blocks: if two examples with different class labels

contain exactly the same set of patterns, it will be impossible for a deterministic

algorithm to predict both correctly.

Most pattern set mining techniques can be summarized in the following manner:

d 1 , d 2 ∈ D

, d 1 ∼ P d 2 ⇔

(

∀

∈ P

: match ( r , d 1 )

Frequent Pattern Mining

Search WWH ::

Custom Search

Home