Databases Reference
In-Depth Information
From the properties of closed itemsets, it follows that a rule set contain-
ing only specialistic rules is a compact and lossless representation of
only
when anti-monotonic constraints (e.g., support constraint) are applied. This
property is lost in case of non anti-monotonic constraints (e.g., confidence
constraint). In the CCRS representation, each compact rule contains all in-
formation needed to generate all the rules encoded in it independently from
the other rules in the set. Hence, it is always possible to regenerate set
R
R
starting from the CCRS rule set.
6 Mining Compact Representations
In this section we present an algorithm to extract the compact rule set and
the classification rule cover representations from a sequence dataset. The al-
gorithm works in a specific instance of our framework for sequential rule min-
ing. Recall that in our framework sequence mining is constrained by the pair
( Ψ,Φ ). The set of matching functions Ψ defines the containment between a
sequence and an input-sequence. In the considered framework instance, func-
tions in Ψ yield a contiguous subsequence relation. Hence, the mined compact
representations yield contiguous closed sequences and contiguous generator
sequences. In this section, we will denote the mined sequences simply as gen-
erator or closed sequences since the contiguity constraint is assumed. Set Φ
contains all matching functions which satisfy the maximum gap constraint.
Hence, the gap constrained subsequence relation X
Φ S (where X is a se-
quence and S an input-sequence) can be formalized as X
S and X satisfies
the maximum gap constraint in S . Furthermore, for an easier readability we
denote sequence support, rule support, and rule confidence by omitting set Φ .
The proposed algorithm is levelwise [5] and computes the set of closed
and generator sequences by increasing length. At each iteration, say itera-
tion k , the algorithm performs the following operations. (1) Starting from set
M
k
of k-sequences, it generates set
M
k +1
of (k+1)-sequences. Then, (2) it
prunes from
k +1 sequences encoding only unfrequent classification rules.
This pruning method limits the number of iterations and avoids the genera-
tion of uninteresting (i.e., unfrequent) rules. (3) The algorithm checks M
M
k +1
k
k
against M
to identify the subset of closed sequences in M
and the subset
k +1 . (4) Based on this knowledge, the algorithm
updates the CRC and CCRS sets.
Each sequence is provided of the necessary information to support the next
iteration of the algorithm and to compute the compact representations poten-
tially encoded by it. The following information is associated to a sequence X .
(a) A sequence identifier list (denoted id-list ) recording the input-sequences
including X . The id-list is a set of triplets ( SID,eid,Class ), where SID is
the input-sequence identifier, eid is the event identifier for the first 1
of generator sequences in
M
item of
1 As discussed afterwards, knowledge about the event identifiers for the other items
in X is not necessary.
Search WWH ::




Custom Search