ChunkSim - Data Warehousing Design and Advanced Engineering Applications - page 145

Database Reference

In-Depth Information

Table 2. Database setup for TPC-H SF 25

Relation

Placement

Size

N Chunks

Chunk Size

Lineitem

29GB

118

250MB

Orders

7.1GB

118

62MB

Customer

780 MB

copied

-

Supplier

50.4 MB

copied

-

Nation

Very small

copied

-

Region

Very small

copied

-

USING CHUNKSIM TO ANALYzE

PERFORMANCE AND AVAILABILITY

and work_mem=100MB. The system was op-

timized by installing indexes and analyzing the

schema to allow the system to compute useful

statistics. For runs on the nodes we used a DWPA

(Furtado 2007) parallel database middleware pro-

totype developed in our research lab and targeted

at share-nothing environments. The prototype

takes as input actions, which may be local SQL

queries (queries to be executed in nodes), or data

transfer operations among any nodes. When the

load-balanced query action is requested, a con-

troller node processes the query as discussed in

previous sections.

The computed performance indexes were then

obtained by running a TPC-H workload round-

robin for 10 hours. The resulting performance

indexes are shown next:

In this section we show results from using

ChunkSim to analyze an SN system running a

typical data warehouse. We consider the dataset

from the decision support benchmark TPC-H,

with a scale factor of 25. Given that the TPC-H

data set is not a pure star schema, we have chosen

a subset of it that can be configured as a star. We

used relations Lineitem, Orders, Region, Nation,

Customer, Supplier, with relations Lineitem and

Orders equi-partitioned on their join key. We also

defined a target chunk size for relation Lineitem

of 250 MB (we rounded this value to get 118

chunks). This determined chunk hash ranges on

the orderkey attribute, which also determined the

chunks for the Orders relation, so that Lineitem

and Orders became equi-partitioned on their

equi-join attribute. This allowed us to process

the schema similarly to a star schema, where the

fact is represented by a join between Lineitem

and Orders and the dimensions are the remaining

relations. Table 2 shows the sizes and number of

chunks in the experimental setup.

The data sets were installed in 16 PCs from

our lab. The PCs all ran Windows XP, with 2GB

RAM, 200 GB disks and Ultra-ATA storage

controller, but half the nodes were Pentium IV

at 3GHz, while the other half were Pentium D at

3.40 GHz. The database engines are Postgres with

shared_bufferes=32MB, tmp_buffers=100MB

{1,1.037,1.21,1.047,1.123,1.0023,1.051,1.102,1

.867,1.85,1.93,1.85,1.79,2.1,2.02,1.86}

ChunkSim also collected Workload Proper-

ties (WP) statistics from this test run. With that

information and different Placement Layout

(PL) and Replication Layout (RL) configuration

alternatives, we then issued a set of experiments

to illustrate the results given by ChunkSim. Given

the ChunkSim results reports, we have built charts

that allow us to better compare the alternatives

tested by the simulator. In the next subsections

we present the results of experiments with System

Size Planning (SSZP), Performance Analysis of

Next Page

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home