Database Reference
In-Depth Information
For enhanced performance , user requests can be redirected to other repli-
cas within the same datacenter (but different racks) to avoid overloading one
single copy of the data and thus improve the performance under heavy load.
For high availability , failure and network partitions are common in large-
scale distributed systems; by replicating, we can avoid single points of failure.
A particularly challenging issue that arises in the context of storage systems with
geographically distributed data replication is how to ensure a consistent state of
all the replicas. Insuring strong consistency by means of synchronous replication
introduces an important performance overhead due to the high latencies of networks
across datacenters (the average round trip latency in Amazon sites varies from 0.3 ms
in the same site to 380 ms in different sites [8]). Consequently, several weaker consis-
tency models have been implemented (e.g., casual consistency, eventual consistency,
timeline consistency). Such relaxed consistency models allow the system to return
some stale data at some points in time.
Many cloud storage services opt for weaker consistency models to achieve better
availability and performance (i.e., such consistency models allow cloud storage sys-
tems to replicate their data and scale out their infrastructure on multiple geographi-
cally distributed datacenters—to cope with the ever-growing size of Big Data and
the increasing number of users (worldwide) - while simultaneously aid in retaining
performance requirements of users, and availability guarantees of data). For example,
Facebook uses the eventually consistent storage system Cassandra to scale up to host
data for more than 800 million active users [9]. This comes at the cost of a high prob-
ability of stale data being read (i.e., the replicas involved in the reads may not always
have the most recent write). As shown in [10], under heavy reads and writes some of
these systems may return up to 66.61% stale reads, although this may be tolerable
for users in the case of social network. With the ever-growing diversity in the access
patterns of cloud applications along with the unpredictable diurnal/monthly changes
in services loads and the variation in network latency (intra- and inter-sites), static and
traditional consistency solutions are not adequate for the cloud. With this in mind,
several adaptive consistency solutions, among which, our automated and self-adaptive
approach (i.e., Harmony [11]) has been introduced to adaptively tune the consistency
level at run-time to improve the performance/availability of cloud storage systems while
simultaneously maintaining a low fraction of stale reads. Harmony, to be application-
adaptive, takes into account the application needs expressed by the stale reads rate
that can be tolerated. At run-time, Harmony collects relevant information about the
storage system (i.e., the network latency, which directly affects updates propagation
to replicas) and the application demands (i.e., the frequency of access patterns dur-
ing reads, writes, and updates) to estimate the stale reads rate and make a decision
accordingly (i.e., change the number of replicas involved in the read operation).
In summary, it is useful to take a step back, consider the variety of consistency solu-
tions offered by different cloud storage systems, and describe them in an unified way, put-
ting the different uses and types of consistency in perspective; this is the main purpose
of this topic chapter. The rest of this chapter is organized as follows: in Section 10.2, we
briefly introduce the CAP theorem and its implications in cloud systems. Then, we pres-
ent the different types of consistency in Section 10.3. After that, we briefly introduce the
Search WWH ::




Custom Search