Database Reference
In-Depth Information
as predicted. Instead, most MSSs holding several decades worth of data still
have an average file size measured in megabytes. The amount of metadata
increases as users store more and more rather small files into MSSs, with min-
imal regard to managing the data. Metadata includes file names, description of
file content, versions of files, and information to help relate files to each other
(e.g., create datasets). The poor metadata rates are a hindrance in managing
a large number of files. This phenomenon contributes to a paradox in data
storage called the sustainability paradox, which states, as the world continues
to generate more data, we are more in danger of losing the data we really wish
to keep. This is due to the fact that the important datasets continue to be a
smaller and smaller portion of a user's total amount of data retained.
As MSSs amass data over the years, users often find the limited meta-
data file management capabilities and their lackluster metadata performance
a burden in terms of managing ever-increasing amounts of data. To accom-
modate this, users often resort to rudimentary data-management techniques
like establishing file-naming conventions or by using aggregation techniques
to limit the number of files and grouping-related files into a single file. More
advanced data-management techniques include establishing a database exter-
nal to the MSS to keep track of additional metadata. The most advanced
techniques of data management that are emerging are systems that hide the
complexities of dealing with tape and disk and hierarchical storage manage-
ment (HSM). One example of this approach is the SRM standard. 3 There are
several implementations of SRM software in use today, which generally focus
on providing a common single view of a user's data regardless of the specific
MSS-file system. These implementations can be part of the HSM or external
to the HSM. Examples of HSM systems that have embedded SRM implemen-
tations are CASTOR, an HSM developed at CERN; 4 and Enstore, developed
at Brookhaven National Laboratory. 5 Examples of HSM systems that have ex-
ternal SRM interfaces are Berkeley Storage Manager (BeStMan), 6 developed
at Lawrence Berkeley National Laboratory for HPSS, 14 and storage resource
manager (StoRM), 7 developed in Italy for GPFS. SRM software provides a
common set of commands to help manage data between systems that may not
share command sets or provide similar functionality. However, SRMs currently
do not expose the performance expectation from an MSS (see Conclusions and
Future Work section).
3.2.2 Making Mass Storage Systems Available through
SRM Interfaces
Given the complexity of MSSs, sometimes it is not straightforward that the
SRM functionality be made part of such systems. In practice, there are two
possible approaches: extend the MSS to support the SRM functionality or pro-
vide an external layer that fronts the MSS. It turns out that both approaches
are valid. If the developers of the MSS are involved in SRM projects, they
Search WWH ::




Custom Search