Database Reference
In-Depth Information
Table 2. Data specification
Parameter value
Disk-based data
Size of disk-based relation R 0.5 million to 8 million tu-
ples
Size of each tuple
bytes
Stream data
Size of each tuple 20 bytes
Size of each node in queue 12 bytes
Benchmark
120
Based on
Zipf's law
Characteristics
Bursty and self-similar
For all c loop seconds the algorithm processes w tuples of stream S ; therefore, the
service rate μ can be calculated by dividing w by the cost for one loop iteration
asshowninEquation3.
w
c loop
μ =
(3)
5 Experiments
We performed experiments to compare the performance of our algorithms with
MESHJOIN. We also validate the measured cost by comparing it with the cal-
culated cost for each algorithm. As mentioned before, we use synthetic data sets
with a known skew.
5.1
Experimental Setup
Hardware Specifications: We carried out our experiments on a Pentium-IV
2X2.13GHz machine under WindowsXP. The maximum memory we allocated for
our experiments is 250MB. We implemented the algorithm in Java. To measure
the memory and processing time, we used built-in plugins provided by Apache
and Java API respectively.
Data specifications: The synthetic workload that we used to test the algo-
rithms was generated using Zipf's Law with exponent 1. The generated stream
has two additional characteristics known as burstyness and self similarity. The
detailed specifications of the data set that we used for analysis are shown in
Table 2. The relation R is stored on disk using MySQL 5.0 databases. To mea-
sure the cost for each I/O operation accurately we set the fetch size for the
ResultSet equal to the size of one partition on disk. X-HYBRIDJOIN needs
to store multiple values in the hash table against one key value. However, the
hash table provided by the standard Java API does not support this feature;
 
Search WWH ::




Custom Search