X-HYBRIDJOIN for Near-Real-Time Data Warehousing - Advances in Databases

Database Reference

In-Depth Information

Algorithm 1. Pseudo-code for X-HYBRIDJOIN

Input: A disk-based relation R with an index on join attribute and a stream of updates

S Output: SR

Parameters: w tuples of S and a partition p i of R

Method:

1: LOAD first partition p 1 of R into the non-swappable part of the disk buffer.

2: w ← h S

3: while (true) do

4: if (available stream tuples ≥ w ) then

5: READ w tuples from the stream buffer, load them into H and enqueue their

join attribute values into Q .

6: w ← 0

7: end if

8: for each tuple r in p 1 do

9: if r ∈ H then

10: OUTPUT rH

11: w ← w +number of matching tuples found in H

12: DELETE all matched tuples from H and the corresponding nodes from Q .

13: end if

14: end for

15: READ the oldest join attribute value from Q .

16: LOAD a disk partition p i (where 2 ≤ i ≤ n )of R into the swappable part of the

disk buffer using the join attribute value as an index.

17: for each tuple r in p i do

18: if r ∈ H then

19: OUTPUT rH

20: w ← w +number of matching tuples found in H

21: DELETE all matched tuples from H and the corresponding nodes from Q .

22: end if

23: end for

24: end while

The difference between the two algorithms is that in X-HYBRIDJOIN we di-

vide the disk buffer into two parts. One part stores the most popular pages

of disk-based relation R in memory permanently, and we call this the non-

swappable part of the disk buffer. The other part of the disk buffer is swappable

and is used to load partitions from the remainder of relation R into memory in

the same way as in the HYBRIDJOIN algorithm. As a natural default setting,

we assign the same amount of memory to both parts.

4.2

Algorithm

Once the available memory is distributed among the join components, the algo-

rithm is ready to execute according to the procedure described in Algorithm 1.

Before starting the actual join execution, the algorithm reads a particular por-

tion of the disk-based relation R into the non-swappable part of the disk buffer

Advances in Databases

Search WWH ::

Custom Search

Home