Databases Reference
In-Depth Information
increase in capacity of a disk makes the effects of its failure more serious. From
the above observations, it is preferable to use a large number of small disks to
achieve storage I/O with adequate performance. These disks should also be shared
by a number of hosts via some network.
From the point of view of system configuration the storage system becomes the
center of the entire system. These are called storage-centric configurations. To
implement a scalable storage-centric system, network attached storages (NAS)
and storage area network (SAN) architectures have attracted a great deal of
attention recently. A NAS is directly connected to an IP network to be shared by
multiple hosts connected to a local area network (LAN), while a SAN consists of
a dedicated network, separate from the LAN, using serial connection of storage
devices, e.g., Fiber Channel.
In these configurations, disks are currently assumed to be passive devices. That
is, all their behavior is controlled by their hosts via the network. Therefore,
communications between disks and their hosts are very frequent and this limits
the performance of the I/O. Moreover, to make the system efficient, placement of
the data, including replicas, is very important. The management of data location
requires a dedicated host. Because the host must also control all the accesses to the
system the host will become a bottleneck as the system becomes large.
Furthermore, reliability of the total system largely depends on the reliability of
the central host, the management software on it, and its maintenance operations.
It is well known that the reliability of software and operations are rather low
compared with hardware 1) . These reliabilities multiply the base reliability of disks,
because the system is a series configuration for reliability calculation. This means
the reliability of the system is less than that of the least reliable component. There
are several approaches to preventing loss of data due to a disk failure, such as
mirroring, using error correcting codes, and applying parity calculation techniques,
as described in papers on RAID 2, 3) . However, these approaches are less flexible.
2 Autonomous Disks
To make a storage-centric system scalable, by removing the performance
bottlenecks, and reliable, by excluding the complicated central control, distributed
autonomous control of the storage nodes is essential.
Timely disk-resident data processing has recently attracted a great deal of
attention. Technological progress, such as compact high-performance
microprocessors for disk controllers and large semiconductor memories for disk
cache, allow this capability. We propose a concept of autonomous disks, which
utilize disk-resident processing capability to manage the data stored in them 4) .
There are several other academic research projects to utilize disks' capability
for executing application programs: the IDISK project at UC Berkeley 5) and the Ac
tive Disk projects at Carnegie Mellon 6) and UC Santa Barbara/Maryland 7) . They
focus on the functions and mechanisms for making a combination of a disk-
resident processor and a host execute storage-centric user applications, such as
Search WWH ::




Custom Search