Database Reference
In-Depth Information
therefore, we have used the Multi-Hash-Map from Apache as the hash table in
our experiments.
Measurement strategy: We define the performance of the algorithms as ser-
vice rate, with a higher service rate being better. The service rate has been mea-
sured by calculating the number of tuples processed in a unit second. In each
experiment, the algorithm is executed for one hour. We started our measure-
ments after 20 minutes and keep measuring for 20 minutes. For added accuracy,
we took three readings for each specification and then calculated the average.
Where required we also calculated the confidence interval by considering 95%
accuracy. The calculation of confidence interval is based on 4000 measurements
for one setting. Moreover, during the execution of the algorithm no other appli-
cation was running in parallel.
5.2
Experimental Results
In our experimental study, we analyzed the results from three different per-
spectives. Firstly, we compare the performance of both HYBRIDJOIN and X-
HYBRIDJOIN with the other related algorithms. Secondly, we examine the role
of the non-swappable part of the disk buffer in stream processing. Finally, we
validate our predicted cost model through experiment.
Performance comparisons: The two possible parameters that can vary and
directly affect the performance of the algorithms under test are the total avail-
able memory for the algorithm and the size of the disk-based relation. In our
experiments, we tested the algorithms for different values of these parameters
and compared their performance.
Performance comparisons for varying size of the disk-based relation: In
the experiment shown in Figure 3(a), we assumed the total allocated memory for
the join was fixed while the size of the disk-based relation R was grown exponen-
tially. Figure 3(a) shows that for all sizes of R performance of X-HYBRIDJOIN
is substantially better than all the other approaches. Another key observation
from the figure is that when R is 0.5 million the performance of HYBRIDJOIN
is almost 70% of X-HYBRIDJOIN and when R is equal to 8 million this percent-
age decreases to 50%. This means that the performance of the other algorithms
decreases more sharply compared to X-HYBRIDJOIN when R increases.
Performance comparisons when the size of available memory varies:
In our second experiment, we analysed the performance of X-HYBRIDJOIN
using different memory budgets while the size of R is fixed (2 million tuples).
Figure 3(b) presents the results of our experiment. The figure indicates that, for
all memory budgets, the performance of X-HYBRIDJOIN is again significantly
better than all the other algorithms. The reason behind this improvement is
our intuition about X-HYBRIDJOIN. In our calculations, introducing the non-
swappable part in X-HYBRIDJOIN can save about 33% of the disk I/O cost.
Although keeping the non-swappable part in memory increases the look-up cost
 
Search WWH ::




Custom Search