Consistency Management in Cloud Storage Systems - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

regardless of the number of the threads (higher latency dominates the probability of

stale reads), while when the latency is small, the access pattern has more influence

on the probability.

10.7 CONCLUSION

This chapter addresses a major open issue in cloud storage systems: the manage-

ment of consistency for replicated data. Despite a plethora of cloud storage sys-

tems available today, data consistency schemes are still far from satisfactory. We

take this opportunity to ponder the CAP theorem 13 years after its formulation

and discuss its implications in the modern context of cloud computing. The ten-

sion among consistency, availability, and partition tolerance has been handled in

various ways in existing distributed storage systems (e.g., by relaxing consistency at

wide-area level). We therefore provide an overview of the major consistency models

and approaches used for providing scalable, yet highly available services on clouds.

We categorize the consistency models according to their consistency guarantees

into: (1) strong form of consistency including linearizability and serializability

and (2) weaker form of consistency including eventual causal and timeline consis-

tency. For the weaker consistency models, we elaborate on what additional opera-

tion ordering is applied to handle conflict situations. Cloud storage is foundational

to cloud computing because it provides a backend for hosting not only user data but

also the system-level data needed by cloud services. We survey the state-of-the-art

cloud storage systems used by the main cloud vendors (i.e., in Amazon, Google, and

Facebook). In addition to a general presentation of these systems architectures and

use cases, we discuss the employed consistency model by each cloud storage sys-

tem. The survey helps to understand the mapping between the applied consistency

technique and target requirements of the applications using these cloud solutions.

Moreover, and to handle the tremendous size of Big Data, the scale of cloud sys-

tems is extremely increasing and the cloud applications are significantly diversify-

ing (e.g., access pattern and diurnal/monthly loads). We advocate self-adaptivity as

a key means to approach the tradeoffs that must be handled by the user applications.

We review several approaches of adaptive consistency that provide flexible consis-

tency management for users to reduce performance overhead and monetary cost

when data are distributed across geographically distributed sites. Then, we discuss

in detail our adaptive consistency solution Harmony: a novel approach that handles

data consistency in cloud storage adaptively by choosing the most appropriate con-

sistency level dynamically at run time. In Harmony, we collect relevant information

about the storage system to estimate the stale read rate when consistency is eventual,

and make a decision accordingly. To be application-adaptive, Harmony takes into

account the application's needs expressed by the stale read rate that can be tolerated.

Harmony is evaluated with the Cassandra cloud storage on Amazon EC2. The chap-

ter helps to fill the gap between the conceptual overview of consistency models and

how they are used in practise in real cloud systems. Also, this chapter emphasizes on

the importance of the adaptive consistency approaches to cope with the cloud/data

scale and applications diversity and offer insights into designing new consistency

approaches for Big Data storage systems.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home