Database Reference
In-Depth Information
of the short storage duration of these data, according to the data reliability model that will
be described in Chapter 4 , one replica would suffice to meet the requirements of data reli-
ability and storage duration. For this type of data, relatively low reliability assurance can be
applied and recovery ability is most likely unnecessary. However, by applying the conven-
tional three-replica strategy, these data are stored with the same number of replicas, which is
inappropriate for both data types. For the former type of data, the data reliability assurance
by using three replicas incurs a high storage cost especially when large amounts of data are
stored. For the latter type of data, the additional two replicas could be simply unneeded, thus
incurring unnecessary extra storage cost.
To reduce the Cloud storage cost while meeting the data reliability requirement,
both the abovementioned major factors must be considered. A new data storage as
well as data reliability assurance mechanism should be proposed to replace the con-
ventional three-replica replication strategy.
3.2.2 Data storage devices and schemes
In current Cloud technology, the disk is the primary storage device for data storage,
where a minor proportion of other storage devices are also applied. In Section 2.1 , we
presented some research studies for storage devices such as magnetic tape and solid-
state drives, where features of these storage devices were briefly introduced. From the
perspective of data reliability management, the primary difference among these storage
devices is the failure rate pattern. For example, compared to the disk failure rate pattern,
the failure rate pattern of magnetic tapes could have a similar shape but a much slower
transform process, while the failure rate pattern of solid-state drives could be much
more different. In this topic, the research is conducted primarily based on a Cloud stor-
age environment by using disks. However, by involving a variable failure rate pattern
into the data reliability model, providing data reliability assurance by using different
storage devices could also be addressed. To facilitate the presentation, we use the term
“disk” for describing all kinds of storage devices in the rest of the topic.
In addition to analyzing storage devices in the Cloud, the research on Cloud storage
and data reliability assurance issues also requires the storage scheme of the Cloud be
determined. As mentioned in Section 2.2 , there are two major data storage schemes
in existing distributed storage systems, which are the replication-based data storage
scheme and the erasure coding-based data storage scheme. Instead of using the era-
sure coding-based data storage scheme, our research still focuses on the Cloud with a
direct replication-based data storage scheme. The reason for this is twofold:
First, for pulsar searching and a wide range of similar data-intensive applications that involve
intensive large-scale data processing and generation, applying erasure coding approaches
that are currently used in some of the Cloud storage systems is not practical. For these ap-
plications, the term data-intensive does not only mean the requirement of big data storage
ability, but also means the requirement of processing data with high performance and low
data access delay. In an erasure coding-based data storage environment, the computation
and time overheads for coding and decoding the data are so high that the overall cost-saving
effort in reducing storage cost is significantly weakened.
Second, the replication-based data storage scheme is currently the most widely used Cloud
storage scheme, which is applied by the major Cloud service providers. By conducting
 
Search WWH ::




Custom Search