Databases Reference
In-Depth Information
These are fundamental properties in the cloud which have strong relationships
with consistency guaranties. In fact, scaling and strong consistency are some-
where antagonist: if you want strong consistency, your system can not scale and
if you want system scalability you have to relax consistency or to impose restric-
tions on your application (for example maintaining transaction's updates within
a single data store).
In [8], Agrawal et al. define two manners to ensure scalability. In the one
hand, they define the data fusion that combined multiple small data granules to
provide transactional guarantees when scaling larger data granules. That is the
approach of Google's Bigtable [5], Amazon's Dynamo [7] and Yahoo's PNUTS
[6]. In the other hand, they present the data fission which splits a database into
independent shards or partitions and provide transactional guarantees only on
these shards (relational cloud [24] and ElasTraS [25]).
Live database migration is a key component to ensure elasticity. In shared disk
architecture (Bigtable [5] and ElasTraS [25]), you have to migrate the database
cache but persistent data are stored in a network attached storage and do not
need migration while in shared nothing disk architecture (Zephyr [26]) you need
also to migrate persistent data which are quite larger than cache data.
Consistency has been widely studied in the context of transactions (see [27]
for a general presentation of distributed transaction processing). A strong consis-
tency model has emerged for relational DBMSs named ACID (Atomicity, Con-
sistency, Isolation and Durability). This model is dicult to scale that is the
reason why large scale web applications have popularize a new kind of consis-
tency model named the BASE model (Basically Available, Soft state and Even-
tual consistency). BASE ensures scalability but at the cost of a more complex
writing of applications, because programmers have to manually ensure part of
the consistency. For example, applications have to access only data in a single
node, avoiding the use of costly distributed protocols to synchronize nodes.
Several works try to help programmers in their task. [24] developed a workload-
aware partitioner which uses graph-partitioning to analyze complex query work-
loads and proposes a mapping of data items to node that minimize the number
of multi-node transactions/statements. [28] is a recent work addressing the prob-
lem of client-centric consistency on top of eventually consistent distributed data
stores (Amazon S3 for example). It consists in a middleware service running
on the same server than the application and providing the same behavior than
a causally consistent data store even in the presence of failures or concurrent
updates. This service uses vector clocks and client-side caching to ensure client-
centric properties. The main interest of this proposal is that it comes at a low
cost and it is transparent for programmers.
[29] presents a new transaction paradigm, where programmers can specify dif-
ferent consistency guaranties at the data level. The system ensures these guar-
anties and allows for certain data to dynamically switch from one consistency
level to another depending on live statistics on data. Three classes of consis-
tency are defined: class A corresponds to serialisable or strong consistency, class
B corresponds to session consistency (minimum acceptable level of consistency
 
Search WWH ::




Custom Search