Using Multiple Data Stores in the Cloud: Challenges and Solutions - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

These are fundamental properties in the cloud which have strong relationships

with consistency guaranties. In fact, scaling and strong consistency are some-

where antagonist: if you want strong consistency, your system can not scale and

if you want system scalability you have to relax consistency or to impose restric-

tions on your application (for example maintaining transaction's updates within

a single data store).

In [8], Agrawal et al. define two manners to ensure scalability. In the one

hand, they define the data fusion that combined multiple small data granules to

provide transactional guarantees when scaling larger data granules. That is the

approach of Google's Bigtable [5], Amazon's Dynamo [7] and Yahoo's PNUTS

[6]. In the other hand, they present the data fission which splits a database into

independent shards or partitions and provide transactional guarantees only on

these shards (relational cloud [24] and ElasTraS [25]).

Live database migration is a key component to ensure elasticity. In shared disk

architecture (Bigtable [5] and ElasTraS [25]), you have to migrate the database

cache but persistent data are stored in a network attached storage and do not

need migration while in shared nothing disk architecture (Zephyr [26]) you need

also to migrate persistent data which are quite larger than cache data.

Consistency has been widely studied in the context of transactions (see [27]

for a general presentation of distributed transaction processing). A strong consis-

tency model has emerged for relational DBMSs named ACID (Atomicity, Con-

sistency, Isolation and Durability). This model is dicult to scale that is the

reason why large scale web applications have popularize a new kind of consis-

tency model named the BASE model (Basically Available, Soft state and Even-

tual consistency). BASE ensures scalability but at the cost of a more complex

writing of applications, because programmers have to manually ensure part of

the consistency. For example, applications have to access only data in a single

node, avoiding the use of costly distributed protocols to synchronize nodes.

Several works try to help programmers in their task. [24] developed a workload-

aware partitioner which uses graph-partitioning to analyze complex query work-

loads and proposes a mapping of data items to node that minimize the number

of multi-node transactions/statements. [28] is a recent work addressing the prob-

lem of client-centric consistency on top of eventually consistent distributed data

stores (Amazon S3 for example). It consists in a middleware service running

on the same server than the application and providing the same behavior than

a causally consistent data store even in the presence of failures or concurrent

updates. This service uses vector clocks and client-side caching to ensure client-

centric properties. The main interest of this proposal is that it comes at a low

cost and it is transparent for programmers.

[29] presents a new transaction paradigm, where programmers can specify dif-

ferent consistency guaranties at the data level. The system ensures these guar-

anties and allows for certain data to dynamically switch from one consistency

level to another depending on live statistics on data. Three classes of consis-

tency are defined: class A corresponds to serialisable or strong consistency, class

B corresponds to session consistency (minimum acceptable level of consistency

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home