Database Reference
In-Depth Information
Figure 5. ChunkSim workload properties, placement and replication layout parameters
planning and analysis.
The chunk key is the partitioning criteria value
that determined the chunk (ChunkSim needs
only a unique chunk identifier), and chunk size
is its resulting size in bytes. CPL is a set of pairs
{(C
l
, N
j
)}.
Replication Layout (RL)
: RL information
maintains a list of which nodes contain replicas
of each chunk. For chunk Cl,
l
, the list is a set of
nodes{N
j
}.
The ChunkSim parameters discussed above
are summarized in Figure 5.
The WP information is obtained by collecting
runtime statistics, from either a production system
or benchmark setup. A benchmark setup is one
with at least a set of samples of the chunks in each
node, together with replicated dimensions and a
set of expected workload queries. This allows the
simulator to collect expected chunk processing
times for WP.
The CL and RL information are obtained as a
result of the data allocation and replication actions
described in the next subsection.
ChunkSim Model Parameters
ChunkSim builds a model of a system configura-
tion that includes the set of parameters: Chunk
Placement Layout (CPL), Workload Properties
(WP) and Replication Layout (RL). With these
parameters ChunkSim runs experiments and
delivers a summary analysis of perofrmance and
availability characteristics. The system configura-
tion parameters may be collected from a running
environment or be a what-if analysis for system
deployment or reorganization.
Workload Properties (WP)
: a workload is
typically characterized by a set of pairs of queries
and their relative frequency of occurrence in the
workload: W={(Q
i
,
f
i
)}. For ChunkSim WP pur-
poses, we also collect the following parameters
for each query Qi
i
ran on each node N
j
:
Processing time statistic (average) of chunk Cl
l
for query Qi
i
ran on node N
j
: t(C
l
Q
i
N
j
);
Per node results Transfer and Merge query time
statistic (average) for query Qi:
i
: tt(Q
i
N
j
),
tm(Q
i
,N
j
);
The processing time statistics of individual
chunks t(C
l
Q
i
N
j
) may be replaced by a
further summarized version, the aver-
age chunk processing time (i.e. over all
chunks) statistic for query Qi
i
ran on node
N
j
: t(CQ
i
N
j
);
Data Allocation and
Replication Alternatives
Consider a data set D that is partitioned into Cc
chunks and placed into Nn nodes. For load and
availability balancing, Cc>> Nn.
Data allocation refers to how chunks are
placed in the system (the terms data allocation
and placement are used interchangeably). Either
manual or automatic specification of data alloca-
tion is possible. Custom Placement (Pl-C) refers
to a system administrator specifying manually the
chunk layout or, more likely, how many chunks
there should be at each individual node, in which
Chunk Placement Layout (CPL)
: the CPL
refers to information on how chunks are placed
into nodes. Chunk C
l
(C
key
, C
sz
) is characterized
by the chunk key (C
key
) and a size in bytes (C
sz
).
Search WWH ::
Custom Search