Database Reference
In-Depth Information
regardless of the number of the threads (higher latency dominates the probability of
stale reads), while when the latency is small, the access pattern has more influence
on the probability.
10.7 CONCLUSION
This chapter addresses a major open issue in cloud storage systems: the manage-
ment of consistency for replicated data. Despite a plethora of cloud storage sys-
tems available today, data consistency schemes are still far from satisfactory. We
take this opportunity to ponder the CAP theorem 13 years after its formulation
and discuss its implications in the modern context of cloud computing. The ten-
sion among consistency, availability, and partition tolerance has been handled in
various ways in existing distributed storage systems (e.g., by relaxing consistency at
wide-area level). We therefore provide an overview of the major consistency models
and approaches used for providing scalable, yet highly available services on clouds.
We categorize the consistency models according to their consistency guarantees
into: (1) strong form of consistency including linearizability and serializability
and (2) weaker form of consistency including eventual causal and timeline consis-
tency. For the weaker consistency models, we elaborate on what additional opera-
tion ordering is applied to handle conflict situations. Cloud storage is foundational
to cloud computing because it provides a backend for hosting not only user data but
also the system-level data needed by cloud services. We survey the state-of-the-art
cloud storage systems used by the main cloud vendors (i.e., in Amazon, Google, and
Facebook). In addition to a general presentation of these systems architectures and
use cases, we discuss the employed consistency model by each cloud storage sys-
tem. The survey helps to understand the mapping between the applied consistency
technique and target requirements of the applications using these cloud solutions.
Moreover, and to handle the tremendous size of Big Data, the scale of cloud sys-
tems is extremely increasing and the cloud applications are significantly diversify-
ing (e.g., access pattern and diurnal/monthly loads). We advocate self-adaptivity as
a key means to approach the tradeoffs that must be handled by the user applications.
We review several approaches of adaptive consistency that provide flexible consis-
tency management for users to reduce performance overhead and monetary cost
when data are distributed across geographically distributed sites. Then, we discuss
in detail our adaptive consistency solution Harmony: a novel approach that handles
data consistency in cloud storage adaptively by choosing the most appropriate con-
sistency level dynamically at run time. In Harmony, we collect relevant information
about the storage system to estimate the stale read rate when consistency is eventual,
and make a decision accordingly. To be application-adaptive, Harmony takes into
account the application's needs expressed by the stale read rate that can be tolerated.
Harmony is evaluated with the Cassandra cloud storage on Amazon EC2. The chap-
ter helps to fill the gap between the conceptual overview of consistency models and
how they are used in practise in real cloud systems. Also, this chapter emphasizes on
the importance of the adaptive consistency approaches to cope with the cloud/data
scale and applications diversity and offer insights into designing new consistency
approaches for Big Data storage systems.
Search WWH ::




Custom Search