Compact Representations of Sequential Classification Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

3.2 Sequence Support

The concept of support is bound to dataset

. In particular, for a sequence

X the support in a dataset

which

contain X [4]. Hence, we need to define when an input-sequence contains a

sequence. Analogously to the concept of sequence containment introduced

in Definition 3, an input-sequence S contains a sequence X when the events

in X match the events in S based on a given matching function. However,

in an input-sequence S events are characterized by their position within S .

This information can be exploited to constrain the occurrence of an arbitrary

sequence X in the input-sequence S .

Commonly considered constraints are maximum and minimum gap con-

straints and windows constraints [17, 25]. Maximum and minimum gap con-

straints specify the maximum and minimum number of events in S which

may occur between two consecutive events in X . The window constraint spec-

ifies the maximum number of events in S which may occur between the first

and last event in X . For example sequence ADA occurs in the input-sequence

S = ADCBA , and satisfies a minimum gap constraint equal to 1, a maximum

gap constraint equal to 3 and a window constraint equal to 4.

In the following we formalize the concept of gap constrained occurrence

of a sequence into an input-sequence. Similarly to Definition 3, we introduce

a set of possible matching function to check when an input-sequence S in

is the number of input-sequences in

contains an arbitrary sequence X . With respect to Definition 3, these matching

functions may incorporate gap constraints. Formally, a gap constraint on a

sequence X and an input-sequence S can be formalized as Gapθ K ,where Gap

is the number of events in S between either two consecutive elements of X (i.e.,

maximum and minimum gap constraints), or the first and last elements of X

(i.e., window constraint), θ is a relational operator (i.e., θ

∈{

≥

, = ,

≤

}

and K is the maximum/minimum acceptable gap.

Definition 4 (Gap Constrained Subsequence). Let X =( x 1 ,...,x m ) be

an arbitrary sequence and S =( s 1 ,...,s l ) an arbitrary input-sequence in

with arbitrary length m

l.LetΦ be a set of matching functions between two

arbitrary sequences, and Gap θ K be a gap constraint. Sequence X occurs in

S under the constraint Gap θ K, written as X Φ S, if there is a function

≤

Φ S and (b) depending on the constraint type, ϕ

satisfies one of the following conditions

∈

Φ such that (a) X

•∀

∈{

1 ,...,m

−

}

, ( ϕ ( j +1)

−

ϕ ( j ))

≤

K, for maximum gap constraint

•∀

∈{

1 ,...,m

−

}

, ( ϕ ( j +1)

−

ϕ ( j ))

≥

K, for minimum gap constraint

•

( ϕ ( m )

−

ϕ (1))

≤

K, for window constraint

When no gap constraint is enforced, the definition above corresponds to

Definition 3. When consecutive events in X are adjacent in input-sequence S ,

then X is a string sequence in S [32]. This case is given when the maximum

gap constraint is enforced with maximum gap K = 1. Finally, when set Φ is the

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home