Information Technology Reference
In-Depth Information
discussed local pruning; FDM-LUP discussed local pruning and upperbound
pruning; FDM-LPP discussed local pruning and step by step pruning.
12.6.1 Generation of candidate sets
It is important to observe some interesting properties related to large itemsets in
distributed environments since such properties may substantially reduce the
number of messages to be passed across network at mining association rules.
There is important relationship between large data sets and distributed database:
every global large datasets must be local large data sets in some site. If an dataset
X
in site
S
i is global large dataset and local large dataset, then
X
in site
S i is called
gl-large at site
i . All gl-large datasets in one site will form a basis for the site to
generate its own candidate sets.
There are two features about local large datasets and gl-large datasets: First, if
an itemsets
S
X
at site
S
i is locally large, its all subsets at site
S
i also locally large.
Second, if an itemsets
S i also gl-large.
Obviously, there is similar relation in centralized environment. Hereinafter, we
show important results, adopting the technology of effectively creating candidate
sets in distributed environment.
Let GLi denote the set of gl-large itemsets at site
X
at site
S
i is gl-large, its all subsets at site
S
i , and GL i (
k
) denote the
set of gl-large
k
-itemsets at site
S i . If
X L
, then there exists a site
S
i , such that
k
all its size-(
k 1). In a
straightforward adaptation of Apriori, the set of candidate sets at the k-th
iteration, denoted by CA
k
- 1) subsets are gl-large at site
S
i , i.e., they belong to GL i (
candidate sets from Apriori,
would be generated by applying the Apriori_gen function on
, which stands for size-
k
k
L
(
k
-1). That is,
CA
= Apriori_gen(
L
).
k
k
-1
At each sites
S
i , let
CGi(k)
be the set of candidates sets generated by applying
Apriori_gen on
GL
,
i.e.,
i
k
-1
CG i
= Ariori_gen(
GL i
),
k
k
-1
where,
CG
stands for candidate sets generated from gl-large itemsets. Therefore
CG i
is generated from
GL i
. Since
GL i
L
CG i
is a subset of
CA
k
k -1
k -1
k -1
k
n
i
G CG i
. In following discuss, let
CG
denote the sets
k
k
k
For every
k
> 1 , the set of all large
k
- itemsets
L
(k) is a subset of
CG
(
k
)
n
i
G CG i
=
, where
CG i ( k )
= Ariori_gen(
GL i
). Therefore,
k
k -1
n
i
n
i
G CG i
G Ariori_genGL
L
⊆ CG
=
=
. (12.5)
k
k
k
k-1
Search WWH ::




Custom Search