Databases Reference
In-Depth Information
Experimentation g5k
Simulation
(Experim.) Mean = 148 seconds Std.Dev. = 76
(Sim) Mean = 145 seconds Std.Dev = 81
0
100
200
300
400
500
Reconstruction Time (seconds)
Fig. 6. Distribution of reconstruction time in an 64 nodes during 4 hours experiment
compared to simulation
Storage System Description. In few words, the system is made of a storage
layer (upper layer) built on top of the DHT layer (lower layer) running Pas-
try [13]. The lower layer is in charge of managing the logical topology: finding
devices, routing, alerting of device arrivals or departures. The upper layer is in
charge of storing and monitoring the data.
Storing the Data. The system uses Reed-Solomon erasure codes [15] to intro-
duce redundancy. Each data block has a device responsible of monitoring it. This
device keeps a list of the devices storing a fragment of the block. The fragments
of the blocks are stored locally on the Pastry leafset of the device in charge [16].
Monitoring the System. The storage system uses the information given by the
lower level to discover device failures. In Pastry, a device checks periodically if
the members of its leafset are still up and running. When the upper layer receives
a message that a device left, the device in charge updates its block status.
Monitored Metrics. The application monitors and keep statistics on the
amount of data stored on its disks, the number of performed reconstructions
along with their duration, the number of dead blocks that cannot be recon-
structed. The upload and download bandwidth of devices can be adjusted.
Results. There exist a lot of different storage systems with different parameters
and different reconstruction processes. The goal of the paper is not to precisely
tune a model to a specific one, but to provide a general analytical framework to
be able to predict any storage system behavior. Hence, we are more interested
here by the global behavior of the metrics than by their absolute values.
Studied Scenario. By using simulations we can easily evaluate several years of a
system, however it is not the case for experimentation. Time available for a simple
experiment is constrained to a few hours. Hence, we define an acceleration factor ,
as the ratio between experiment duration and the time of real system we want
to imitate. Our goal is to check the bandwidth congestion in a real environment.
Thus, we decided to shrink the disk size (e.g., from 10 GB to 100 MB, a reduction
of 100
), inducing a much smaller time to repair a failed disk. Then, the device
failure rate is increased (from months to a few hours) to keep the ratio between
disk failures and repair time proportional. The bandwidth limit value, however,
is kept close to the one of a “real” system. The idea is to avoid inducing strange
behaviors due to very small packets being transmitted in the network.
×
 
Search WWH ::




Custom Search