Information Technology Reference
In-Depth Information
Redundant Data Update (SRDU) algorithm for updating redundant data in erasure-coded
distributed storage. This algorithm exploits the structure of Reed-Solomon erasure correction
codes to enable the reuse of old redundant data in generating the new redundant data. This
enables the algorithm to cut down the amount of data transfer by as much as 70% for the SRDU
algorithm. It is worth noting that the algorithms presented in this chapter are not limited to
parallel server systems, but are also applicable to any striped data storage, such as disk arrays,
RAID [1], or even the emerging peer-to-peer systems. For this reason we will use the term
“node” to refer to a device in the striped storage, e.g., a server in a parallel streaming server, a
disk in a RAID, or a user host in a peer-to-peer streaming system.
15.2 Related Work
The problem of data reorganization has been studied previously in the context of disk arrays
[2-3]. The study by Ghandeharizadeh and Kim [2] is the earliest study on data reorganization
known to the author. They investigated the data reorganization problem in the context of adding
disks to a continuous media server. They studied the round-robin data striping commonly found
in disk arrays and proposed techniques to perform data reorganization online , i.e., without
disrupting on-going video streams. Due to the round-robin placement requirement, a large
portion of the data blockswill need to be redistributed tomaintain the data placement orderwhen
a new disk is added, thus incurring significant data reorganization overhead. The advantage
is that this approach can maintain perfect streaming load balance when data reorganization is
completed.
In a more recent study by Goel et al . [3], an algorithm called SCADDAR for data placement
and data reorganization is proposed for use in disk arrays. In this algorithm, each data block
is initially randomly distributed to the disks with equal probabilities. When a new disk is
added to the disk array, each block will obtain a new sequence number according to their
randomized SCADDAR algorithm. If the remainder of this number is equal to the disk number
of the newly added disk, the corresponding block will be moved to this new disk. Otherwise,
the block will stay on the original disk. As SCADDAR no longer needs to maintain a strict
round-robin placement order, it can reduce the reorganization overhead to levels approaching
the theoretical lower bound.
However, the SCADDAR algorithm did not consider streaming load balance. If we ap-
ply SCADDAR to a streaming server, then the pseudo-random placement policy can re-
sult in significant streaming load imbalance, especially after a large number of disks have
been added to the system. This load imbalance complicates data transmission scheduling
and may also reduce the usable streaming capacity and/or increase the response time of the
system.
Interestingly, the previous two studies can be considered as two extremes of the trade-off be-
tween data reorganization overhead and streaming load balance. In particular, Ghandeharizadeh
and Kim's algorithm achieves perfect load balance at the expense of substantial data re-
organization overhead, while the SCADDAR algorithm achieves near-minimal data reor-
ganization overhead at the expense of load imbalance. The Row-Permutated Data Reor-
ganization described in Section 15.3, by comparison, can achieve perfect streaming load
balance and yet incurs significantly lower reorganization overhead than the round-robin
algorithm.
Search WWH ::




Custom Search