Information Technology Reference
In-Depth Information
many government and business organizations were overwhelmed with data
that could not be effectively processed, linked, and analyzed with traditional
computing approaches.
Several solutions have emerged including the MapReduce architecture
pioneered by Google and now available in an open-source implementa-
tion called Hadoop used by Yahoo, Facebook, and others (see Section 17.3,
“Hadoop”).
21.3.2.1 Brewer's CAP Theorem and the BASE Principle
In the previous section, we briefly discussed techniques for achieving ACID
properties in a database system. However, applying these techniques in
large-scale scenarios such as data services in the cloud leads to scalability
problems: the amount of data to be stored and processed and the transac-
tion and query load to be managed are usually too large to run the database
services on a single machine. To overcome this data storage bottleneck, the
database must be stored on multiple nodes, for which horizontal scaling is
the typically chosen approach.
The database is partitioned across the different nodes: either tablewise or
by sharding (see Section 21.3.3 below). Both cases result in a distributed sys-
tem for which Eric Brewer has formulated the famous CAP theorem, which
characterizes three of the main properties of such a system:
1. Consistency : All clients have the same view, even in the case of
updates. For multisite transactions, this requires all-or-nothing
semantics. For replicated data, this implies that all replicas have
always consistent states.
2. Availability : Availability implies that all clients always find a replica
of data even in the presence of failures.
3. Partition tolerance : In the case of network failures that split the nodes
into groups (partitions), the system is still able to continue the
processing.
The CAP theorem further states that in a distributed, shared-data system,
these three properties cannot be achieved simultaneously in the presence of
failures. In order to understand the implications, we have to consider pos-
sible failures. For scalability reasons, the database is running on two sites
S1 and S2 sharing a data object o, for example, a flight booking record. This
data sharing should be transparent to client applications, that is, an appli-
cation AS1 connected to site A and AS2 accessing the database via site S2.
Both clients should always see the same state of o even in the presence of an
update. Hence, in order to ensure a consistent view, any update performed
for instance by AS1 and changing o to a new state o has to be propagated by
sending a message m to update o at S2 so AS2 reads o. To understand why
Search WWH ::




Custom Search