Database Reference
In-Depth Information
many replicas there will be for dataset D. Given a
replication degree r , there will be Cc
for processing, local query runtimes, and
results transfer and merge times (according
to the WP information). The output of the
simulation is the average expected runtime
value (secs).
Experiment: an experiment is a set of runs with
some parameter that varies in order to ana-
lyze the effect of that parameter on perfor-
mance or availability properties of the sys-
tem. Additionally, for statistics relevance,
ChunkSim always runs a pre-defined num-
ber of times (100 by default) for each pa-
rameter value.
é ê ê
ù ú ú
r
Cc
chunks in the system.
A replication degree of 0 means that the data
set will have no replicas, no chunk will be rep-
licated into any other node; A replication degree
of 0.5 means that half the data set chunks will
be replicated into additional nodes; A replication
degree of 2 means that all chunks of a data set
will have two replicas located in two other nodes.
A replication degree of 15 in a 16-node system
corresponds to full mirroring. In terms of size, a
100 GB data set will occupy a total of 1.6TB with
full replication, 300 GB with r=2 and 150GB with
r=0.5 . The smallest the replication factor, the lower
the loading and storage requirements.
Replication alternatives follow the same logic
of placement alternatives, meaning that to Pl-C,
Pl-H, Pl-W and Pl-Wf correspond Rl-C, Rl-H and
Rl-W. These are denoted as replication policies.
This set of placement and replication alter-
natives is the basic set already implemented in
ChunkSim and in the actual DWPA parallel data
warehouse architecture prototype (Furtado 2007).
Other semi-automated approaches can be added
that may for instance take into account groups of
nodes for availability or performance reasons.
ChunkSim offers the following experiments:
Performance Analysis of Replication Degrees
(PARD) - this experiment answers the
question of how different replication de-
grees influence system performance;
Additional inputs: replication Degrees array;
Outputs: a set of tuples (Replication Degree,
Time PL-H (LP), Time FM (LP), Time (LP),
Time Query). The “Time (LP)” and “Time
Query” fields are the average expected run-
time of the Local Processing (LP) part of a
query and of the whole Query, respectively
(the LP part of a query is the fraction that
is processed locally at each node, before
transfer and merge times). The Time “PL-
H (LP)” and “Time FM (LP)” fields are for
comparison purposes, since they represent
“Slow” Homogeneous Placement with no
replicas (PL-H) and “Fast” Full Mirroring
(FM) runtimes, respectively.
ChunkSim Estimation of
Performance and Availability
The ChunkSim simulator implements the data al-
location alternatives (placement and replication)
and collects the system configuration information
(WP, CL and RL) that it needs to model the sys-
tem. We further define a run and an experiment
as the actions the simulator uses to output some
analysis report:
For illustration purposes, Table 1 shows an
example of the output report of PARD (correspond-
ing to the experimental setup that will be shown
later on in section 5). In that table the Replication
Degree (RD) quantifies how much replication
there are for the chunks in the SN system. For
instance, RD=10% means that only 10% of the
fact chunks have one copy, while 100% means
Run: a run is a simple event-based simulation
of the on-demand, chunk-wise process-
ing algorithm of Figure 4, simulating on-
demand assignment of chunks to nodes
Search WWH ::




Custom Search