Database Reference
In-Depth Information
throughput The disadvantage is the greater risk of data loss if a server
crashes and loses unsynched updates.
Synchronous vs. asynchronous replication : Synchronous replication
ensures all copies are up-to-date but potentially incurs high latency on
updates. Furthermore, availability may be impacted if synchronously
replicated updates cannot complete while some replicas are offline.
Asynchronous replication avoids high write latency but allows replicas to
be stale. Furthermore, data loss may occur if an update is lost due to failure
before it can be replicated.
Data partitioning : Systems may be strictly row-based or allow for column
storage. Row-based storage supports efficient access to an entire record and
is ideal if we typically access a few records in their entirety. Column-based
storage is more efficient for accessing a subset of the columns, particularly
when multiple records are accessed.
Florescu and Kossmann [32] argued that in a cloud environment, the main
metric that needs to be optimized is the cost as measured in dollars. Therefore,
the big challenge of data management applications is no longer on how fast a
database workload can be executed or whether a particular throughput can be
achieved; instead, the challenge is how many machines are necessary to meet
the performance requirements of a particular workload. This argument fits well
with a rule-of-thumb calculation that has been proposed by Jim Gray regarding
the opportunity costs of distributed computing on the Internet as opposed to local
computations [35]. Gray reasons that except for highly processing-intensive appli-
cations outsourcing computing tasks into a distributed environment does not pay
off because network traffic fees outnumber savings in processing power. In princi-
ple, calculating the tradeoff between basic computing services can be useful to get
a general idea of the economies involved. This method can easily be applied to the
pricing schemes of cloud computing providers (e.g., Amazon, Google). Florescu
and Kossmann [32] have also argued in the new large-scale web applications, the
requirement to provide 100% read and write availability for all users has over-
shadowed the importance of the ACID paradigm as the gold standard for data
consistency. In these applications, no user is ever allowed to be blocked. Hence,
consistency has turned to be  an optimization goal in modern data management
systems to minimize the cost of resolving inconsistencies and not a constraint as in
traditional database systems. Therefore, it is better to design a system that it deals
with resolving inconsistencies rather than having a system that prevents inconsis-
tencies under all circumstances.
Kossmann et al. [41] conducted an end-to-end experimental evaluation for the
performance and cost of running enterprise web applications with OLTP workloads
on alternative cloud services (e.g., RDS, SimpleDB, S3, Google AppEngine, Azure).
The results of the experiments showed that the alternative services varied greatly
both in cost and performance. Most services had significant scalability issues. They
confirmed the observation that public clouds lack of support for uploading large data
volumes. It was difficult for them to upload 1 TB or more of raw data through the
APIs provided by the providers. With regard to cost, they concluded that Google
Search WWH ::




Custom Search