Information Technology Reference
In-Depth Information
case. On the other hand, if
X
is evenly distributed among all the partitions, then
px i
( )
=
/
n
) is themaximal in this case.
Therefore the following metric can be used to measure the skewness of a data
partition.
Definition 12.8 Given a database partition D i , (1 i n) , the skewness S(X) of
an itemset is defined by
1
, 1≤
i
n
, and the value of
H
(
X
) = log(
n
H
- H(X)
S X
(
)
=
max
(12.9)
H
max
n
i
Ã
where
H X
(
)
= −
(
px i
( )
×
log(
px i
( )))
and
H max =
log(
n
)
=
1
The skewness
S
(
X
) has the following properties:
(1)
S
(
X
) = 0, when all
p
x (
i
), 1≤ i
n
, are equal. So the skewness is at its lowest
value when
is distributed evenly in all partitions.
(2) S( X ) = 1, if ∃ k [1, n ] such that p x (
X
¬
k
) =1, and
p
x (
i
) = 0 for ∀ i
k,
1≤
i
n
. So
the skewness is at its highest value when
X
occurs only in one partition.
(3) 0<
)<1, in all the other cases.
Workload balance is a measurement of the distribution of the support
clusterings of the large itemsets over the partitions at the processors. We define
(
S
(
X
à s
W
=
w X
)
×
px i
( )
to be the itemset workload in a partition D i , where
i
x
L
Ls is the set of all the large itemsets. Intuitively, the workload W i in partition Di
is the ratio of the total supports of the large itemsets in D i over all the partitions.
Note that
à 1
n
W
=
1
i
i
=
Definition 12.9
For a database partition D i , (1 i n) , of a database D, the
workload balance factor TB(D) of the partition is given by
Ã
n
W
log(
W
)
i
i
TB D
(
)
=
i
=
1
(12.10)
log( )
n
The metric
TB
(
D
) has the following properties:
Search WWH ::




Custom Search