Database Reference
In-Depth Information
Table 2. Database setup for TPC-H SF 25
Relation
Placement
Size
N Chunks
Chunk Size
Lineitem
29GB
118
250MB
Orders
7.1GB
118
62MB
Customer
780 MB
copied
-
Supplier
50.4 MB
copied
-
Nation
Very small
copied
-
Region
Very small
copied
-
USING CHUNKSIM TO ANALYzE
PERFORMANCE AND AVAILABILITY
and work_mem=100MB. The system was op-
timized by installing indexes and analyzing the
schema to allow the system to compute useful
statistics. For runs on the nodes we used a DWPA
(Furtado 2007) parallel database middleware pro-
totype developed in our research lab and targeted
at share-nothing environments. The prototype
takes as input actions, which may be local SQL
queries (queries to be executed in nodes), or data
transfer operations among any nodes. When the
load-balanced query action is requested, a con-
troller node processes the query as discussed in
previous sections.
The computed performance indexes were then
obtained by running a TPC-H workload round-
robin for 10 hours. The resulting performance
indexes are shown next:
In this section we show results from using
ChunkSim to analyze an SN system running a
typical data warehouse. We consider the dataset
from the decision support benchmark TPC-H,
with a scale factor of 25. Given that the TPC-H
data set is not a pure star schema, we have chosen
a subset of it that can be configured as a star. We
used relations Lineitem, Orders, Region, Nation,
Customer, Supplier, with relations Lineitem and
Orders equi-partitioned on their join key. We also
defined a target chunk size for relation Lineitem
of 250 MB (we rounded this value to get 118
chunks). This determined chunk hash ranges on
the orderkey attribute, which also determined the
chunks for the Orders relation, so that Lineitem
and Orders became equi-partitioned on their
equi-join attribute. This allowed us to process
the schema similarly to a star schema, where the
fact is represented by a join between Lineitem
and Orders and the dimensions are the remaining
relations. Table 2 shows the sizes and number of
chunks in the experimental setup.
The data sets were installed in 16 PCs from
our lab. The PCs all ran Windows XP, with 2GB
RAM, 200 GB disks and Ultra-ATA storage
controller, but half the nodes were Pentium IV
at 3GHz, while the other half were Pentium D at
3.40 GHz. The database engines are Postgres with
shared_bufferes=32MB, tmp_buffers=100MB
{1,1.037,1.21,1.047,1.123,1.0023,1.051,1.102,1
.867,1.85,1.93,1.85,1.79,2.1,2.02,1.86}
ChunkSim also collected Workload Proper-
ties (WP) statistics from this test run. With that
information and different Placement Layout
(PL) and Replication Layout (RL) configuration
alternatives, we then issued a set of experiments
to illustrate the results given by ChunkSim. Given
the ChunkSim results reports, we have built charts
that allow us to better compare the alternatives
tested by the simulator. In the next subsections
we present the results of experiments with System
Size Planning (SSZP), Performance Analysis of
Search WWH ::




Custom Search