Big Data Computing Applications - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

many government and business organizations were overwhelmed with data

that could not be effectively processed, linked, and analyzed with traditional

computing approaches.

Several solutions have emerged including the MapReduce architecture

pioneered by Google and now available in an open-source implementa-

tion called Hadoop used by Yahoo, Facebook, and others (see Section 17.3,

“Hadoop”).

21.3.2.1 Brewer's CAP Theorem and the BASE Principle

In the previous section, we briefly discussed techniques for achieving ACID

properties in a database system. However, applying these techniques in

large-scale scenarios such as data services in the cloud leads to scalability

problems: the amount of data to be stored and processed and the transac-

tion and query load to be managed are usually too large to run the database

services on a single machine. To overcome this data storage bottleneck, the

database must be stored on multiple nodes, for which horizontal scaling is

the typically chosen approach.

The database is partitioned across the different nodes: either tablewise or

by sharding (see Section 21.3.3 below). Both cases result in a distributed sys-

tem for which Eric Brewer has formulated the famous CAP theorem, which

characterizes three of the main properties of such a system:

1. Consistency : All clients have the same view, even in the case of

updates. For multisite transactions, this requires all-or-nothing

semantics. For replicated data, this implies that all replicas have

always consistent states.

2. Availability : Availability implies that all clients always find a replica

of data even in the presence of failures.

3. Partition tolerance : In the case of network failures that split the nodes

into groups (partitions), the system is still able to continue the

processing.

The CAP theorem further states that in a distributed, shared-data system,

these three properties cannot be achieved simultaneously in the presence of

failures. In order to understand the implications, we have to consider pos-

sible failures. For scalability reasons, the database is running on two sites

S1 and S2 sharing a data object o, for example, a flight booking record. This

data sharing should be transparent to client applications, that is, an appli-

cation AS1 connected to site A and AS2 accessing the database via site S2.

Both clients should always see the same state of o even in the presence of an

update. Hence, in order to ensure a consistent view, any update performed

for instance by AS1 and changing o to a new state o has to be propagated by

sending a message m to update o at S2 so AS2 reads o. To understand why

Search WWH ::

Custom Search

Home