Database Reference
In-Depth Information
x
5
x
4
x
5
x
4
( )
x
3
x
3
( )
x
2
x
2
( )
x
1
x
1
( )
x
1
... ... ...
x
t
x
1
x
t
... ... ...
w
em
The samples of
The samples of
w
em
(a) The upper bound.
(b) The lower bound.
Fig. 8.3
The upper/lower bound of
F
P
m
+
1
(
l
)
.
The number of buckets is at most 2
t
, as shown in the following lemma.
(
>
>
)
φ
=
Lemma 8.2 (Number of buckets).
Given w
P
m
with n samples
n
2
t
0
, let
i
[
x
z
i
,
x
z
i
](
≤
≤
)
≤
2
t.
Proof.
In the worst case, each bucket only contains one value in
w
P
m
, which means
that the probability sum of any two consecutive values in
w
P
m
1
i
b
be b exclusive buckets satisfying Equation 8.3, then b
1
is greater than
t
.
Then, if the number of values in
w
P
m
is greater than 2
t
, the probability sum of all
values in
w
P
m
will be greater than 1, which conflicts with the fact that
w
P
m
is a
discrete random variable.
Constructing the buckets only requires one scan of all values in
w
P
m
. The min-
imal value in bucket
φ
i
=[
x
z
i
,
x
z
i
]
is min
(φ
i
)=
x
z
i
, and the maximal value in
φ
i
is
max
(φ
i
)=
x
z
i
. When computing the probability distribution of
w
P
m
+
1
using
w
P
m
,we
only select one value in each bucket
φ
i
⊂
w
P
m
as a representative, and assign
Pr
(φ
i
)
to the representative. If min
(φ
i
)
is selected as a representative, then the so computed
F
P
m
+
1
(
l
)
is greater than
F
P
m
+
1
(
l
)
; if max
(φ
i
)
is used as a representative, then the so
computed
F
P
m
+
1
(
l
)
is smaller than
F
P
m
+
1
(
l
)
.
Example 8.2 (Bucket Approximation). Consider path P
m
and edge e
m
. Let
f
P
m
|
e
m
(
|
)=
.
≤
≤
=
x
j
y
0
2
for
1
j
5
.Ift
2
, then all values in w
P
m
are divided into
three buckets:
[
x
1
,
]
,
[
,
]
,
[
,
]
.
.
.
2
, respectively.
If the minimal value of each bucket is used as a representative, then x
1
,x
3
and
x
5
are selected. As a result, the l-weight constrained region of P
m
+
1
is increased
such that the shaded area is included. So the calculated F
P
m
+
1
(
x
2
x
3
x
4
x
5
x
5
, with probability
0
4
,
0
4
and
0
l
)
is greater than the
actual F
P
m
+
1
(
)
.
If the maximal value of each bucket is used as a representative, then x
2
,x
4
and
x
5
are selected. The l-weight constrained region of P
m
+
1
is decreased such that
the shaded area is excluded. So the calculated F
P
m
+
1
(
l
l
)
is smaller than the actual
F
P
m
+
1
(
)
l
.
Therefore, the average value of
F
P
m
+
1
(
and
F
P
m
+
1
(
l
)
l
)
can be used to approxi-
mate the actual
F
P
m
+
1
(
l
)
. The approximation quality is guaranteed by the following
lemma.