Secondary Storage Configurations for Advanced Data Engineering - Nontraditional Database Systems

Databases Reference

In-Depth Information

increase in capacity of a disk makes the effects of its failure more serious. From

the above observations, it is preferable to use a large number of small disks to

achieve storage I/O with adequate performance. These disks should also be shared

by a number of hosts via some network.

From the point of view of system configuration the storage system becomes the

center of the entire system. These are called storage-centric configurations. To

implement a scalable storage-centric system, network attached storages (NAS)

and storage area network (SAN) architectures have attracted a great deal of

attention recently. A NAS is directly connected to an IP network to be shared by

multiple hosts connected to a local area network (LAN), while a SAN consists of

a dedicated network, separate from the LAN, using serial connection of storage

devices, e.g., Fiber Channel.

In these configurations, disks are currently assumed to be passive devices. That

is, all their behavior is controlled by their hosts via the network. Therefore,

communications between disks and their hosts are very frequent and this limits

the performance of the I/O. Moreover, to make the system efficient, placement of

the data, including replicas, is very important. The management of data location

requires a dedicated host. Because the host must also control all the accesses to the

system the host will become a bottleneck as the system becomes large.

Furthermore, reliability of the total system largely depends on the reliability of

the central host, the management software on it, and its maintenance operations.

It is well known that the reliability of software and operations are rather low

compared with hardware 1) . These reliabilities multiply the base reliability of disks,

because the system is a series configuration for reliability calculation. This means

the reliability of the system is less than that of the least reliable component. There

are several approaches to preventing loss of data due to a disk failure, such as

mirroring, using error correcting codes, and applying parity calculation techniques,

as described in papers on RAID 2, 3) . However, these approaches are less flexible.

2 Autonomous Disks

To make a storage-centric system scalable, by removing the performance

bottlenecks, and reliable, by excluding the complicated central control, distributed

autonomous control of the storage nodes is essential.

Timely disk-resident data processing has recently attracted a great deal of

attention. Technological progress, such as compact high-performance

microprocessors for disk controllers and large semiconductor memories for disk

cache, allow this capability. We propose a concept of autonomous disks, which

utilize disk-resident processing capability to manage the data stored in them 4) .

There are several other academic research projects to utilize disks' capability

for executing application programs: the IDISK project at UC Berkeley 5) and the Ac

tive Disk projects at Carnegie Mellon 6) and UC Santa Barbara/Maryland 7) . They

focus on the functions and mechanisms for making a combination of a disk-

resident processor and a host execute storage-centric user applications, such as

Search WWH ::

Custom Search

Home