Databases Reference
In-Depth Information
14
Secondary Storage Configurations for
Advanced Data Engineering
Haruo Yokota
Ryota Abe
Tokyo Institute of Technology
ABSTRACT
Applications of advanced data engineering require scalable and reliable secondary
storage systems. These requirements make the system configurations storage-centric
rather than CPU-centric. To implement a scalable storage-centric system, network
attached storage (NAS) and storage area network (SAN) architectures have recently
attracted a great deal of attention. Current technological progress also allows disk-
resident data processing. This capability is useful for managing distributed disks, as
well as for executing application programs on the disks, such as the proposed intelligent
disks e.g., active disks. We propose autonomous disks for self management in network
environments using their disk-resident data processing capability. Autonomous disks
can handle disk failures and load skews by a combination of active rules and distributed
directories while remaining transparent to the hosts. These will be key functions for
the next generation of storage systems, and are applicable to many advanced
applications, such as a large Web server having many HTML files and B2B e-commerce
frameworks generating enormous XML files. In this chapter, we describe the concept
of autonomous disks by comparing them with ordinary disks.
1 Introduction
Improvements in computer architecture and Internet technology have enabled the
implementation of many types of sophisticated advanced database applications.
We already have high performance processors and high bandwidth networks, which
are adequate to realize these. Compared with these components, however, the
progress in performance enhancement of secondary storage systems is rather poor.
Hence, the scalability and reliability of secondary storage systems have become
amongst the most significant aspects for high performance intensive data processing.
It is true that recording densities of magnetic hard disk drives have recently
improved tremendously. However, a large capacity hard disk drive has problems
in performance and reliability. Access latency, caused by head seek and rotation
wait, is an intrinsic restriction on the performance of disk devices. Each access for
a large capacity disk induces latency, while parallel accesses for multiple small
disks for the same amount of data could mask the latency. Simultaneously, the
212
Search WWH ::




Custom Search