Information Technology Reference
In-Depth Information
case. On the other hand, if
X
is evenly distributed among all the partitions, then
px i
( )
=
/
n
) is themaximal in this case.
Therefore the following metric can be used to measure the skewness of a data
partition.
Definition 12.8
Given a database partition D
i
, (1
≤
i
≤
n) , the skewness S(X) of
an itemset is defined by
1
, 1≤
i
≤
n
, and the value of
H
(
X
) = log(
n
H
- H(X)
S X
(
)
=
max
(12.9)
H
max
n
i
Ã
where
H X
(
)
= −
(
px i
( )
×
log(
px i
( )))
and
H
max
=
log(
n
)
=
1
The skewness
S
(
X
) has the following properties:
(1)
S
(
X
) = 0, when all
p
x
(
i
), 1≤
i
≤
n
, are equal. So the skewness is at its lowest
value when
is distributed evenly in all partitions.
(2) S(
X
) = 1, if ∃
k
∈
[1,
n
] such that
p
x
(
X
¬
k
) =1, and
p
x
(
i
) = 0 for ∀
i
k,
1≤
i
≤
n
. So
the skewness is at its highest value when
X
occurs only in one partition.
(3) 0<
)<1, in all the other cases.
Workload balance is a measurement of the distribution of the support
clusterings of the large itemsets over the partitions at the processors. We define
(
S
(
X
Ã
s
W
=
w X
)
×
px i
( )
to be the itemset workload in a partition D
i
, where
i
x
∈
L
Ls is the set of all the large itemsets. Intuitively, the workload W
i
in partition Di
is the ratio of the total supports of the large itemsets in D
i
over all the partitions.
Note that
Ã
1
n
W
=
1
i
i
=
Definition 12.9
For a database partition D
i
, (1
≤
i
≤
n) , of a database D, the
workload balance factor TB(D) of the partition is given by
Ã
n
−
W
log(
W
)
i
i
TB D
(
)
=
i
=
1
(12.10)
log( )
n
The metric
TB
(
D
) has the following properties: