Databases Reference
In-Depth Information
Proof.
Let each element
v
∈
[
N
] with a uniformed probability 1
/k
to be the
part of the element of
F
.Let
H
denote the number of different elements in
Q
⊂
[
N
]thathavebeenhitby
F
after repeating the same procedure
k
times.
The probability
p
that
hits each set
Q
of size at most
k
on each of its
elements could be bounded by
F
p
=
E
(
H
)
/E
(
k
)
=
{
i
=1
[(
k
1
/k
)
k−
1
−
i
+1)
/k
]
×
k
×
(1
/k
)
×
(1
−
}
/k
=
i
=1
{
i
+1)
/k
2
]
1
/k
)
k−
1
[(
k
−
×
(1
−
}
>
i
=1
{
i
+1)
/k
2
]
1
/k
)
k
[(
k
−
×
(1
−
}
>
i
=1
{
[(
k
−
i
+1)
/k
2
]
×
(1
/
4)
}
(
)
(
k/
2)
/k
2
]
=(1
/
4)
×
[(1 +
k
)
/
(2
k
)]
>
(1
/
4)
=(1
/
4)
×
[(1 +
k
)
×
×
[
k/
(2
k
)]
>
1
/
8,
1
/k
)
k
where inequality (
) follows from the fact that the sequence (1
−
is
monotonously increasing.
4 Proposed Rule Mining Approach
4.1 The Strategy of the Deterministic Approach
To identify the significant CARs for class
A
(as
c
i
∈
, we provide a de-
terministic approach that employs a single application of a “strong” (2
n
,k,
0)-
selector. This approach ensures that every potential significant rule for
A
will be hit at least once. To apply a family
C
)in
R
of subsets of [2
n
] means first
F
to arrange the sets of
F
into a sequence
F
1
,F
2
,...,F
|F|
.Inthe
i
th step,
only the CARs in
with labels in
F
i
will be involved in the procedure
SIGNIFICANCE-TEST, while other CARs can be ignored. Thus, we have
an
O
(
k
2
log 2
n
)-complexity to hit each of the
k
potential significant rules in-
dependently at least once, due to the property of the “strong” selector. If the
current test for
F
i
contains only one potential significant rule
R
j
and
R
j
is
also a significant rule for
A
, then we call the function LOG-TEST, which is
based on a binary search and finally find
R
j
. With a “smaller” list of rules (all
significant rules and some insignificant rules), we then compute the weighting
score of each rule for class
A
, and finally catch the “best
R
k
” (top-
k
score)
rules for
A
.
4.2 The Strategy of the Randomised Approach
In this section, we use the randomised
k
-selector to substitute the “strong”
selector in Sect. 4.1. This randomised approach ensures that every potential
significant rule for class
A
will be hit at least once with a high probability.