Database Reference
In-Depth Information
(line 1). In the beginning all slots in the hash table H are empty; therefore,
h S is assigned to w (line 2). In the abstract level description, the algorithm
contains two kind of loops. One is called the outer loop, which is an endless
loop (line 3). The key objective of the outer loop is to build the stream in the
hash table. Within the outer loop, the algorithm runs two independent inner
loops. One loop implements the probing module for the non-swappable part of
the disk buffer, while the other inner loop implements the probing of the swap-
pable part of the disk buffer. As the outer loop begins, the algorithm observes
the status of the stream buffer. If stream is available the algorithm reads the
w tuples from the stream buffer and loads them into the hash table, while also
enqueuing their attribute values into the queue. After completing the stream
input the algorithm resets w to 0 (lines 4 to 7). The algorithm then executes the
first inner loop, in which it reads all tuples one-by-one from the non-swappable
part of the disk buffer and looks them up into the hash table. In the case of
a match, the algorithm generates the join output. Due to the multi-hash-map,
there can be more than one match against one disk tuple. After generating the
join output the algorithm deletes all matched tuples from the hash table, along
with the corresponding nodes from the queue. The algorithm also increments w
with the number of vacated slots in the hash table (lines 8 to 14). Before starting
the second inner loop, the algorithm reads the oldest value of the join attribute
from the queue (line 3) and loads a disk partition p i (where 2
n )intothe
swappable part of the disk buffer, using that join attribute value as an index
(lines 15 and 16). As the specified disk partition is loaded into the swappable
part of the disk buffer, the algorithm starts the second inner loop and repeats
all the steps described in the first inner loop (lines 17 to 23).
i
Note: If we switch-off the first inner loop (lines 8 to 14), the algorithm works
as HYBRIDJOIN.
4.3
Cost Model
In this section we derive the general formulae for calculating the cost for our
proposed X-HYBRIDJOIN. We derive equations for memory and processing time
of X-HYBRIDJOIN. Equation 1 describes the total memory used to implement
the algorithm except for the stream buffer; whereas Equation 2 calculates the
processing cost for w tuples. The symbols used to measure the costs are specified
in Table 1.
Memory cost. In X-HYBRIDJOIN, the disk buffer is divided into two equal
parts. One is swappable, the other is non-swappable. As said before, the largest
share of the total memory is used for the hash table; a much smaller portion is
used for the disk buffer. The queue size is a constant fraction of the hash table
size. The memory for each component of X-HYBRIDJOIN can be calculated as
shown below.
Memory reserved for the swappable and non-swappable parts= v P + v P =2 v P
(in the case of HYBRIDJOIN it is v P only).
Memory for the hash table = α ( M
2 v P ).
 
Search WWH ::




Custom Search