Database Reference
In-Depth Information
Algorithm 1.
Pseudo-code for X-HYBRIDJOIN
Input:
A disk-based relation
R
with an index on join attribute and a stream of updates
S
Output:
SR
Parameters:
w
tuples of
S
and a partition
p
i
of
R
Method:
1: LOAD first partition
p
1
of
R
into the non-swappable part of the disk buffer.
2:
w ← h
S
3:
while
(true)
do
4:
if
(available stream tuples
≥ w
)
then
5: READ
w
tuples from the stream buffer, load them into
H
and enqueue their
join attribute values into
Q
.
6:
w ←
0
7:
end if
8:
for
each tuple
r
in
p
1
do
9:
if
r ∈ H
then
10: OUTPUT
rH
11:
w ← w
+number of matching tuples found in
H
12: DELETE all matched tuples from
H
and the corresponding nodes from
Q
.
13:
end if
14:
end for
15: READ the oldest join attribute value from
Q
.
16: LOAD a disk partition
p
i
(where 2
≤ i ≤ n
)of
R
into the swappable part of the
disk buffer using the join attribute value as an index.
17:
for
each tuple
r
in
p
i
do
18:
if
r ∈ H
then
19: OUTPUT
rH
20:
w ← w
+number of matching tuples found in
H
21: DELETE all matched tuples from
H
and the corresponding nodes from
Q
.
22:
end if
23:
end for
24:
end while
The difference between the two algorithms is that in X-HYBRIDJOIN we di-
vide the disk buffer into two parts. One part stores the most popular pages
of disk-based relation
R
in memory permanently, and we call this the non-
swappable part of the disk buffer. The other part of the disk buffer is swappable
and is used to load partitions from the remainder of relation
R
into memory in
the same way as in the HYBRIDJOIN algorithm. As a natural default setting,
we assign the same amount of memory to both parts.
4.2
Algorithm
Once the available memory is distributed among the join components, the algo-
rithm is ready to execute according to the procedure described in Algorithm 1.
Before starting the actual join execution, the algorithm reads a particular por-
tion of the disk-based relation
R
into the non-swappable part of the disk buffer