Database Reference
In-Depth Information
4.2 Background
In an effort to achieve reliable and ecient data placement, high-level data
management tools such as the reliable file transfer service (RFT), 61 the
lightweight data replicator (LDR), 51 and the data replication service (DRS) 20
were developed. The main motivation for these tools was to enable byte
streams to be transferred in a reliable manner by handling possible failures
like dropped connections, machine reboots, and temporary network outages
automatically via retrying. Most of these tools are built on top of GridFTP, 2
which is a secure and reliable data transfer protocol especially developed for
high-bandwidth wide area networks.
Beck et al. introduced logistical networking, 10 which performs global
scheduling and optimization of data movement, storage, and computation
based on a model that takes into account all the network's underlying physical
resources. Systems such as the storage resource broker (SRB) 8 and the storage
resource manager (SRM) 75 were developed to provide a uniform interface for
connecting to heterogeneous data resources. SRB provides a single front-end
that can access a variety of back-end storage systems. SRM is a standard
interface specification that permits multiple implementations of the standard
on top of storage systems. SRMs were discussed in detail in Chapter 3.
GFarm 65 provided a global parallel filesystem with online petascale storage.
Their model specifically targets applications where data primarily consists of
a set of records or objects that are analyzed independently. GFarm takes
advantage of this access locality to achieve a scalable I/O bandwidth using
a parallel file system integrated with process scheduling and file distribution.
The Open-source Project for a Network Data Access Protocol (OPeNDAP)
provides software which makes local multidimensional array data accessible to
remote locations regardless of local storage format. 66 OPeNDAP is discussed
in detail in Chapter 10.
OceanStore 54 aimed to build a global persistent data store that can scale to
billions of users. The basic idea is that any server may create a local replica
of any data object. These local replicas provide faster access and robustness
to network partitions. Both GFarm and OceanStore require creating several
replicas of the same data, but still they do not address the problem of schedul-
ing the data movement when there is no replica close to the computation site.
Bent et al. 12 introduced a new distributed file system, the Batch-Aware Dis-
tributed File System (BADFS), and a modified data-driven batch scheduling
system. 11 Their goal was to achieve data-driven batch scheduling by export-
ing explicit control of storage decisions from the distributed file system to
the batch scheduler. Using some simple data-driven scheduling techniques,
they have demonstrated that the new data-driven system can achieve or-
ders of magnitude throughput improvements both over current distributed
file systems such as the Andrew file system (AFS) as well as over traditional
CPU-centric batch scheduling techniques which are using remote I/O.
Search WWH ::




Custom Search