Big Data Computing Applications - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

partitioning tolerance if the system is operational even when the network

between two components of the system is down.

Since distributed systems can satisfy only two of the three properties due

to the CAP theorem, there are three types of distributed systems. CA (con-

sistent, available) systems provide consistency and availability, but cannot

tolerate network partitions. An example of a CA system is a clustered data-

base, where each node stores a subset of the data. Such a database cannot

provide availability in the case of network partitioning, since queries to data

in the partitioned nodes must fail. CA systems may not be useful for cloud

computing, since partitions are likely to occur in medium to large networks

(including the case where the latency is very high). If there is no network

partitioning, all servers are consistent, and the value seen by both clients is

the correct value.

However, if the network is partitioned, it is no longer possible to keep all

the servers consistent in the face of updates. There are then two choices. One

choice is to keep both servers up and ignore the inconsistency. This leads to

AP (available, partition-tolerant) systems where the system is always avail-

able, but may not return consistent results. The other possible choice is to

bring one of the servers down, to avoid inconsistent values. This leads to

CP (consistent, partition-tolerant) systems where the system always returns

consistent results but may be unavailable under partitioning—including the

case where the latency is very high. AP systems provide weak consistency.

An important subclass of weakly consistent systems is those that provide

eventual consistency. A system is defined as being eventually consistent if

the system is guaranteed to reach a consistent state in a finite amount of time

if there are no failures (e.g., network partitions) and no updates are made.

The inconsistency window for such systems is the maximum amount of time

that can elapse between the time that the update is made and the time that

the update is guaranteed to be visible to all clients. If the inconsistency win-

dow is small compared to the update rate, then one method of dealing with

stale data is to wait for a period greater than the inconsistency window and

then retry the query.

Classic database systems focus on guaranteeing the ACID properties and,

therefore, favor consistency over partition tolerance and availability. This is

achieved by employing techniques like distributed locking and two-phase

commit protocols. In certain circumstances, data needs are not transaction-

ally focused, and at such times, the relational model is not the most appropri-

ate one for what we need to do with the data we are storing. However, giving

up availability is often not an option in Web business where users expect a

24 × 7 or always-on operation.

Most traditional RDBMS would guarantee that all the values in all our

nodes are identical before it allows another user to read the values. But as we

have seen, that is at a significant cost in terms of performance. Relational data-

bases, with their large processing overhead in terms of maintaining the ACID

attributes of the data they store and their reliance on potentially processor

Guide to Cloud Computing for Business and Technology Managers

Search WWH ::

Custom Search

Home