Database Reference
In-Depth Information
Algorithm 1. Pseudo-code for X-HYBRIDJOIN
Input: A disk-based relation R with an index on join attribute and a stream of updates
S Output: SR
Parameters: w tuples of S and a partition p i of R
Method:
1: LOAD first partition p 1 of R into the non-swappable part of the disk buffer.
2: w ← h S
3: while (true) do
4: if (available stream tuples ≥ w ) then
5: READ w tuples from the stream buffer, load them into H and enqueue their
join attribute values into Q .
6: w ← 0
7: end if
8: for each tuple r in p 1 do
9: if r ∈ H then
10: OUTPUT rH
11: w ← w +number of matching tuples found in H
12: DELETE all matched tuples from H and the corresponding nodes from Q .
13: end if
14: end for
15: READ the oldest join attribute value from Q .
16: LOAD a disk partition p i (where 2 ≤ i ≤ n )of R into the swappable part of the
disk buffer using the join attribute value as an index.
17: for each tuple r in p i do
18: if r ∈ H then
19: OUTPUT rH
20: w ← w +number of matching tuples found in H
21: DELETE all matched tuples from H and the corresponding nodes from Q .
22: end if
23: end for
24: end while
The difference between the two algorithms is that in X-HYBRIDJOIN we di-
vide the disk buffer into two parts. One part stores the most popular pages
of disk-based relation R in memory permanently, and we call this the non-
swappable part of the disk buffer. The other part of the disk buffer is swappable
and is used to load partitions from the remainder of relation R into memory in
the same way as in the HYBRIDJOIN algorithm. As a natural default setting,
we assign the same amount of memory to both parts.
4.2
Algorithm
Once the available memory is distributed among the join components, the algo-
rithm is ready to execute according to the procedure described in Algorithm 1.
Before starting the actual join execution, the algorithm reads a particular por-
tion of the disk-based relation R into the non-swappable part of the disk buffer
 
Search WWH ::




Custom Search