X-HYBRIDJOIN for Near-Real-Time Data Warehousing - Advances in Databases - page 38

Database Reference

In-Depth Information

Queue

. . . .

t m

t 3

t 2

t 1

H ash tabl e

Join

output

Stream

S

Hash

function

………...

………...

………...

Stream buffer

Disk buffer

Non-

swappable

Swappable

Join window

Disk-based relation

R

p 1

p 2

…..

p n

Fig. 2. Architecture of HYBRIDJOIN and X-HYBRIDJOIN. The only difference be-

tween the two algorithms is that in X-HYBRIDJOIN the disk buffer is split and its

two parts are treated differently, as explained in the text.

tail. For exponent 1 we get the distribution of Zipf's law, which gave rise to the

general term Zipfian distribution. In sales, the 80/20 rule is used to model the

scenario where the frequency of selling a small number of products is signifi-

cantly higher compared to the rest of the product, often simplified in the 80/20

rule [13]. The 80/20 rule of thumb has commonly been observed in commercial

applications [13] [14]. The 80/20 rule corresponds to an exponent slightly smaller

than 1 [13].

Our aim is to describe an algorithm that takes advantage of the likely dis-

tribution of the data. Therefore we created a dataset generator that can create

artificial data sets following a power law with an exponent that can be chosen

freely. In all our experiments, the master data is sorted with respect to the access

frequency.

4X-HYBRIDJOIN

In this section, we describe our second algorithm, X-HYBRIDJOIN which is an

extension of HYBRIDJOIN.

4.1

Difference between X-HYBRIDJOIN and HYBRIDJOIN

As we will see later, the service rate (number of tuples processed per second)

of HYBRIDJOIN increases as the exponent of the distribution goes above 1

i.e. as the distribution gets closer to a short-tailed distribution. However, if a

distribution is fairly short-tailed, then many matches are with the most frequent

tuples. So the question arises, how much can be gained in terms of performance

by buffering the most frequent tuples permanently, and this gives rise to X-

HYBRIDJOIN.

Next Page

Advances in Databases

Search WWH ::

Custom Search

Home