Databases Reference
In-Depth Information
of subsets of [2
n
] means first to arrange the sets of
To apply a family
F
F
,F
|F|
.Inthe
i
th step, each element of [2
n
] will be
contained in
F
i
with a uniformed probability 1
/k
and only the CARs in
into a sequence
F
1
,F
2
,
···
with
labels in
F
i
will be involved in the procedure SIGNIFICANCE-TEST, while
other CARs can be ignored. Thus, we have an
O
(
k
)-complexity to hit each of
the
k
potential significant rules independently once with a high probability,
due to the property of the randomised
k
-selector (see Theorem 1). With a list
of the extracted
O
(
k
) rules (all significant rules and some insignificant rules),
the weighting score of each rule for
A
is computed. The “best
k
” (top-
k
score)
rules are the significant rules for class
A
identified in
R
R
.
4.3 Rule Mining Algorithm
The function (Fig. 2) identifies a rule with a possible large score for class
A
.
Lemma 1.
If there exists only one potential significant rule R
j
in the cur-
rent test F
i
and R
j
is also a significant rule, the function
LOG-TEST
will
return R
j
.
Proof.
According to Definition 9, we know that
c
A
(
R
j
)
>
|I
(
A
)
|
h
=1
c
A
(
Item
h
I
(
A
)). Thus, the subset of
∈
which contains
R
j
is always chosen for further
binary test if
R
j
is the only potential significant rule including in the current
test
F
i
.
The procedure (Fig. 3) identifies all
k
significant rules for class
A
in
R
R
.
will catch all k signif-
Theorem 2.
The procedure
SIGNIFICANCE-TEST
icant rules.
Function
LOG-TEST(
F
i
,
R
);
input:
F
i
(the
i
th element in
F
) and set
R
;
output:
Rw
(a rule in
R
);
(1)begin
(2)
Rw
:=
null
;
(3)
Temp
:=
F
i
;
(4)
while
|Temp| >
1do
(5)
choose an arbitrary subset
Temp
0
with half CARs in
Temp
to test;
if
∀Item
h
∈Temp
0
c
A
(
Item
h
)
≥
∀Item
h
∈Temp−Temp
0
c
A
(
Item
h
)
(6)
(7) then
Temp← Temp
0
;
(8) else
Temp← Temp− Temp
0
;
(9) end while
(10)
Rw ← Temp
;
(11) return (
Rw
);
(12)end
Fig. 2.
The LOG-TEST function