X-HYBRIDJOIN for Near-Real-Time Data Warehousing - Advances in Databases - page 42

Database Reference

In-Depth Information

Table 2. Data specification

Parameter value

Disk-based data

Size of disk-based relation R 0.5 million to 8 million tu-

ples

Size of each tuple

bytes

Stream data

Size of each tuple 20 bytes

Size of each node in queue 12 bytes

Benchmark

120

Based on

Zipf's law

Characteristics

Bursty and self-similar

For all c loop seconds the algorithm processes w tuples of stream S ; therefore, the

service rate μ can be calculated by dividing w by the cost for one loop iteration

asshowninEquation3.

w

c loop

μ =

(3)

5 Experiments

We performed experiments to compare the performance of our algorithms with

MESHJOIN. We also validate the measured cost by comparing it with the cal-

culated cost for each algorithm. As mentioned before, we use synthetic data sets

with a known skew.

5.1

Experimental Setup

Hardware Specifications: We carried out our experiments on a Pentium-IV

2X2.13GHz machine under WindowsXP. The maximum memory we allocated for

our experiments is 250MB. We implemented the algorithm in Java. To measure

the memory and processing time, we used built-in plugins provided by Apache

and Java API respectively.

Data specifications: The synthetic workload that we used to test the algo-

rithms was generated using Zipf's Law with exponent 1. The generated stream

has two additional characteristics known as burstyness and self similarity. The

detailed specifications of the data set that we used for analysis are shown in

Table 2. The relation R is stored on disk using MySQL 5.0 databases. To mea-

sure the cost for each I/O operation accurately we set the fetch size for the

ResultSet equal to the size of one partition on disk. X-HYBRIDJOIN needs

to store multiple values in the hash table against one key value. However, the

hash table provided by the standard Java API does not support this feature;

Next Page

Advances in Databases

Search WWH ::

Custom Search

Home