Database Reference
In-Depth Information
tools to design scientific workflows that are easily understood, insuring that
the workflows are well-formed, and workflow systems that are fault tolerant.
Fault tolerance is important in particular for long-running workflows, so that
they can recover even if the machine they are running on temporarily fails.
We see three major emerging trends that may contribute to future develop-
ments. The first, energy ecient storage systems, stems from the escalating
energy costs of spinning and cooling disk storage systems in data centers.
The second, co-location of data and analysis, stems from the impracticality
of having to move large volumes of data to scientists' sites. The third, the
development of new scientific database management systems, stems from the
wish to isolate the scientist from having to deal with various data formats and
the details of file and storage systems. Next, we describe each of these trends
in more detail.
Energy Ecient Storage Systems
Several studies indicate that disk storage systems and their cooling consume
over 30% of the power in scientific data centers. This percentage of disk stor-
age power consumption will continue to increase, as faster and higher capacity
disks are deployed with increasing energy costs and as data-intensive applica-
tions demand reliable on-line access to data resources. As a result, optimiza-
tion of energy use in scientific data management has become an important
area of research across multiple disciplines such as computer architecture,
power management, operations research, and theoretical computer science.
Concern about the amount of energy used by scientific data centers has also
led to the introduction of commercial products in the area of energy ecient
or “green” data centers. At the system level, a number of integrated storage
solutions have emerged, all of which are based on the general principle of
spinning down disks when not in use and spinning up disks when they are
being accessed. In these systems, disks configured either as RAID sets, or as
independent disks, are programmed to be spun down into standby mode after
experiencing an interval of time without any activity (idle time). The length of
the idle interval, also called idleness threshold , can be fixed or determined dy-
namically based on historical access data. In general, longer idle periods offer
more energy saving opportunities. For this reason, several storage vendors are
now offering hybrid disks which incorporate an SSD cache where commonly
accessed data is maintained, resulting in longer idle periods of the disk. A
major research problem is that of determining optimal idleness thresholds
that also satisfy quality of service requirements in terms of expected system
response time. In recent research works it has been shown that allowing re-
organization of disk contents either dynamically or at periodic reorganization
Search WWH ::




Custom Search