Database Reference
In-Depth Information
Jim Gray came up with the heuristic rule of “20 queries” [ 151 ]. The main idea of
this heuristic is that on each project, we need to identify the 20 most important
questions the user wanted the data system to answer. He argued that five questions
are not enough to see a broader pattern and a hundred questions would result in a
shortage of focus.
In general, it is hard to maintain ACID guarantees in the face of data replication
over large geographic distances. The CAP theorem [ 86 , 138 ] shows that a shared-
data system can only choose at most two out of three properties: Consistency (all
records are the same in all replicas), Availability (a replica failure does not prevent
the system from continuing to operate), and tolerance to Partitions (the system
still functions when distributed replicas cannot talk to each other). When data is
replicated over a wide area, this essentially leaves just consistency and availability
for a system to choose between. Thus, the “C” (consistency) part of ACID is
typically compromised to yield reasonable system availability [ 56 ]. Therefore, most
of the cloud data management overcome the difficulties of distributed replication
by relaxing the ACID guarantees of the system. In particular, they implement
various forms of weaker consistency models (e.g. eventual consistency, timeline
consistency, session consistency [ 219 ]) so that all replicas do not have to agree on
the same value of a data item at every moment of time. Hence, NoSQL systems can
be classified based on their support of the properties of the CAP theorem into three
categories:
￿
CA systems : Consistent and highly available, but not partition-tolerant.
￿
CP systems : Consistent and partition-tolerant, but not highly available.
￿
AP systems : Highly available and partition-tolerant, but not consistent.
In principle, choosing the adequate NoSQL system (from the very wide available
spectrum of choices) with design decisions that best fit with the requirements of
a software application is not a trivial task and requires a careful consideration.
Table 3.1 provides an overview of different design decision for sample NoSQL
systems.
In practice, transactional data management applications (e.g. banking, stock
trading, supply chain management) which rely on the ACID guarantees that
databases provide, tend to be fairly write-intensive or require microsecond precision
and are less obvious candidates for the cloud environment until the cost and latency
of wide-area data transfer decreases. Cooper et al. [ 112 ] discussed the tradeoffs
facing cloud data management systems as follows:
￿
Read performance versus write performance : Log-structured systems that only
store update deltas can be very inefficient for reads if the data is modified over
time. On the other hand, writing the complete record to the log on each update
avoids the cost of reconstruction at read time but there is a correspondingly higher
cost on update. Unless all data fits in memory, random I/O to the disk is needed
to serve reads (e.g., as opposed to scans). However, for write operations, much
higher throughput can be achieved by appending all updates to a sequential disk-
based log.
Search WWH ::




Custom Search