Database Reference
In-Depth Information
Queue
. . . .
t m
t 3
t 2
t 1
H ash tabl e
Join
output
Stream
S
Hash
function
………...
………...
………...
Stream buffer
Disk buffer
Non-
swappable
Swappable
Join window
Disk-based relation
R
p 1
p 2
…..
p n
Fig. 2. Architecture of HYBRIDJOIN and X-HYBRIDJOIN. The only difference be-
tween the two algorithms is that in X-HYBRIDJOIN the disk buffer is split and its
two parts are treated differently, as explained in the text.
tail. For exponent 1 we get the distribution of Zipf's law, which gave rise to the
general term Zipfian distribution. In sales, the 80/20 rule is used to model the
scenario where the frequency of selling a small number of products is signifi-
cantly higher compared to the rest of the product, often simplified in the 80/20
rule [13]. The 80/20 rule of thumb has commonly been observed in commercial
applications [13] [14]. The 80/20 rule corresponds to an exponent slightly smaller
than 1 [13].
Our aim is to describe an algorithm that takes advantage of the likely dis-
tribution of the data. Therefore we created a dataset generator that can create
artificial data sets following a power law with an exponent that can be chosen
freely. In all our experiments, the master data is sorted with respect to the access
frequency.
4X-HYBRIDJOIN
In this section, we describe our second algorithm, X-HYBRIDJOIN which is an
extension of HYBRIDJOIN.
4.1
Difference between X-HYBRIDJOIN and HYBRIDJOIN
As we will see later, the service rate (number of tuples processed per second)
of HYBRIDJOIN increases as the exponent of the distribution goes above 1
i.e. as the distribution gets closer to a short-tailed distribution. However, if a
distribution is fairly short-tailed, then many matches are with the most frequent
tuples. So the question arises, how much can be gained in terms of performance
by buffering the most frequent tuples permanently, and this gives rise to X-
HYBRIDJOIN.
 
Search WWH ::




Custom Search