Databases Reference
In-Depth Information
Proof. Let each element v
[ N ] with a uniformed probability 1 /k to be the
part of the element of
F
.Let H denote the number of different elements in
Q
[ N ]thathavebeenhitby
F
after repeating the same procedure k times.
The probability p that
hits each set Q of size at most k on each of its
elements could be bounded by
F
p = E ( H ) /E ( k )
=
{ i =1 [( k
1 /k ) k− 1
i +1) /k ]
×
k
×
(1 /k )
×
(1
}
/k
= i =1 {
i +1) /k 2 ]
1 /k ) k− 1
[( k
×
(1
}
> i =1 {
i +1) /k 2 ]
1 /k ) k
[( k
×
(1
}
> i =1 {
[( k
i +1) /k 2 ]
×
(1 / 4)
}
( )
( k/ 2) /k 2 ]
=(1 / 4) × [(1 + k ) / (2 k )]
> (1 / 4)
=(1 / 4)
×
[(1 + k )
×
×
[ k/ (2 k )]
> 1 / 8,
1 /k ) k
where inequality ( ) follows from the fact that the sequence (1
is
monotonously increasing.
4 Proposed Rule Mining Approach
4.1 The Strategy of the Deterministic Approach
To identify the significant CARs for class A (as c i
, we provide a de-
terministic approach that employs a single application of a “strong” (2 n ,k, 0)-
selector. This approach ensures that every potential significant rule for A
will be hit at least once. To apply a family
C )in
R
of subsets of [2 n ] means first
F
to arrange the sets of
F
into a sequence F 1 ,F 2 ,...,F |F| .Inthe i th step,
only the CARs in
with labels in F i will be involved in the procedure
SIGNIFICANCE-TEST, while other CARs can be ignored. Thus, we have
an O ( k 2 log 2 n )-complexity to hit each of the k potential significant rules in-
dependently at least once, due to the property of the “strong” selector. If the
current test for F i contains only one potential significant rule R j and R j is
also a significant rule for A , then we call the function LOG-TEST, which is
based on a binary search and finally find R j . With a “smaller” list of rules (all
significant rules and some insignificant rules), we then compute the weighting
score of each rule for class A , and finally catch the “best
R
k ” (top- k score)
rules for A .
4.2 The Strategy of the Randomised Approach
In this section, we use the randomised k -selector to substitute the “strong”
selector in Sect. 4.1. This randomised approach ensures that every potential
significant rule for class A will be hit at least once with a high probability.
Search WWH ::




Custom Search