Database Reference
In-Depth Information
1. Mine a set of supervised patterns satisfying certain constraints
2. Select some patterns out of this set following certain criteria
and iterative pattern are derived set mining :
1. Mine a (set of) supervised pattern(s) satisfying certain constraints
2. Modify the constraints or data
3. Return to 1.
The main argument in favor of the post-processing approach is its efficiency. It allows
to run a pattern mining algorithm only once and hence avoids the possibly time
consuming repeated execution of pattern mining algorithms. The main arguments
in favor of iterative mining algorithms are their potentially higher accuracy and
their potential to use parameter-free pattern mining algorithms; in many of these
algorithms, it is not necessary to define a minimum support threshold in advance.
Both separate-and-conquer and divide-and-conquer techniques have been used
within either of these categories.
Most of these techniques can be understood in terms of the partition that a set
of patterns induces on the data. We therefore first need to introduce the concept of
equivalence relations and partitions :
D
Definition 3.1
An equivalence relation on
is a binary relation
such that for
all d 1 , d 2 , d 3 D
, the relation is:
1. Reflexive: d 1 d 1 .
2. Symmetric: d 1 d 2 d 2 d 1 .
3. Transitive: d 1 d 2 d 2 d 3 d 1 d 3 .
The equivalence relation partitions
into disjunct subsets called equivalence classes
or blocks. The equivalence class of an element d
D
d D |
D
is given as [ d ]
={
d }
d
.
Intuitively, transactions are in an equivalence class if they can not be distinguished
from each other. We can use patterns to create a new database, in which each trans-
action is described by a list of patterns present in it. We consider two transactions
equivalent in this new representation if they are described using the same lists of
patterns.
More formally, an individual pattern r induces an equivalence relation
. The set of blocks is called partition or quotient set, and is denoted by
D
/
d 1 , d 2
D
, d 1
r d 2
match ( r , d 1 )
=
match ( r , d 2 ), and so does a set of patterns
P
:
match ( r , d 2 )).
In fact, the partitioning of a data set into classes that we defined in Definition 2.3
is induced by an equivalence relation based on the class labels.
In a supervised setting, it are derived is important to distinguish blocks which are
pure and which are not pure. A block is pure if all examples in it have the same class
label. Within a supervised setting it is important that the partition induced by a set
of patterns contains mostly pure blocks: if two examples with different class labels
contain exactly the same set of patterns, it will be impossible for a deterministic
algorithm to predict both correctly.
Most pattern set mining techniques can be summarized in the following manner:
d 1 , d 2 D
, d 1 P d 2
(
r
∈ P
: match ( r , d 1 )
=
Search WWH ::




Custom Search