Database Reference
In-Depth Information
Figure 5. ChunkSim workload properties, placement and replication layout parameters
planning and analysis.
The chunk key is the partitioning criteria value
that determined the chunk (ChunkSim needs
only a unique chunk identifier), and chunk size
is its resulting size in bytes. CPL is a set of pairs
{(C l , N j )}.
Replication Layout (RL) : RL information
maintains a list of which nodes contain replicas
of each chunk. For chunk Cl, l , the list is a set of
nodes{N j }.
The ChunkSim parameters discussed above
are summarized in Figure 5.
The WP information is obtained by collecting
runtime statistics, from either a production system
or benchmark setup. A benchmark setup is one
with at least a set of samples of the chunks in each
node, together with replicated dimensions and a
set of expected workload queries. This allows the
simulator to collect expected chunk processing
times for WP.
The CL and RL information are obtained as a
result of the data allocation and replication actions
described in the next subsection.
ChunkSim Model Parameters
ChunkSim builds a model of a system configura-
tion that includes the set of parameters: Chunk
Placement Layout (CPL), Workload Properties
(WP) and Replication Layout (RL). With these
parameters ChunkSim runs experiments and
delivers a summary analysis of perofrmance and
availability characteristics. The system configura-
tion parameters may be collected from a running
environment or be a what-if analysis for system
deployment or reorganization.
Workload Properties (WP) : a workload is
typically characterized by a set of pairs of queries
and their relative frequency of occurrence in the
workload: W={(Q i , f i )}. For ChunkSim WP pur-
poses, we also collect the following parameters
for each query Qi i ran on each node N j :
Processing time statistic (average) of chunk Cl l
for query Qi i ran on node N j : t(C l Q i N j );
Per node results Transfer and Merge query time
statistic (average) for query Qi: i : tt(Q i N j ),
tm(Q i ,N j );
The processing time statistics of individual
chunks t(C l Q i N j ) may be replaced by a
further summarized version, the aver-
age chunk processing time (i.e. over all
chunks) statistic for query Qi i ran on node
N j : t(CQ i N j );
Data Allocation and
Replication Alternatives
Consider a data set D that is partitioned into Cc
chunks and placed into Nn nodes. For load and
availability balancing, Cc>> Nn.
Data allocation refers to how chunks are
placed in the system (the terms data allocation
and placement are used interchangeably). Either
manual or automatic specification of data alloca-
tion is possible. Custom Placement (Pl-C) refers
to a system administrator specifying manually the
chunk layout or, more likely, how many chunks
there should be at each individual node, in which
Chunk Placement Layout (CPL) : the CPL
refers to information on how chunks are placed
into nodes. Chunk C l (C key , C sz ) is characterized
by the chunk key (C key ) and a size in bytes (C sz ).
Search WWH ::




Custom Search