Database Reference
In-Depth Information
responsible for shipping the updates and data definition language operations to the
secondary replicas.
Since some partitions may experience higher load than others, the simple tech-
nique of balancing the number of primary and secondary partitions per node might
not balance the loads. The system can rebalance dynamically using the failover
mechanism to tell a secondary on a lightly loaded server to become the primary,
by either demoting the former primary to secondary, or moving the former primary
to another server. A keyed table group can be partitioned dynamically. If a parti-
tion exceeds the maximum allowable partition size (either in bytes or the amount of
operational load it receives), it is split into two partitions. In general, the size of each
hosted SQL Azure database cannot exceed the limit of 50 GB.
9.5 WEB SCALE DATA MANAGEMENT: TRADEOFFS
An important issue in designing large-scale data management applications is to
avoid the mistake of trying to be “ everything for everyone .” As with many types
of computer systems, no one system can be best for all workloads and different
systems make different tradeoffs to optimize for different applications. Therefore,
the most challenging aspects in these application is to identify the most important
features of the target application domain and to decide about the various design
tradeoffs, which immediately lead to performance tradeoffs. To tackle this prob-
lem, Jim Gray came up with the heuristic rule of “ 20 queries ” [38]. The main idea
of this heuristic is that on each project, we need to identify the 20 most important
questions the user wanted the data system to answer. He argued that five questions
are not enough to see a broader pattern, and a hundred questions would result in a
shortage of focus.
In general, it is hard to maintain ACID guarantees in the face of data replication
over large geographic distances. The CAP theorem [15,34] shows that a shared-
data system can only choose at most two out of three properties: Consistency (all
records are the same in all replicas), Availability (a replica failure does not prevent
the system from continuing to operate), and tolerance to Partitions (the system
still functions when distributed replicas cannot talk to each other). When data
is replicated over a wide area, this essentially leaves just consistency and avail-
ability for a system to choose between. Thus, the C (consistency) part of ACID is
typically compromised to yield reasonable system availability [2]. Therefore, most
of the cloud data management overcomes the difficulties of distributed replica-
tion by relaxing the ACID guarantees of the system. In particular, they implement
various forms of weaker consistency models (e.g., eventual consistency, timeline
consistency, session consistency [60]) so that all replicas do not have to agree on
the same value of a data item at every moment of time. Hence, NoSQL systems
can be classified based on their support of the properties of the CAP theorem into
three categories:
CA systems : Consistent and highly available, but not partition-tolerant
CP systems : Consistent and partition-tolerant, but not highly available
AP systems : Highly available and partition-tolerant, but not consistent
Search WWH ::




Custom Search