Big Data Computing Applications - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

hungry joins, are not the right tool for the task they have before them: quickly

finding relevant data from terabytes of unstructured data (Web content)

that may be stored across thousands of geographically desperate nodes. In

other words, relational model does not scale well for this type of data. Thus,

techniques for guaranteeing strong consistency in large distributed systems

limit scalability and results in latency issues. To cope with these problems,

BASE was proposed as an alternative to ACID.

21.3.2.2 BASE (Basically Available, Soft State, Eventual Consistency)

BASE follows an optimistic approach accepting stale data and approximate

answers while favoring availability. Some ways to achieve this are by sup-

porting partial failures without total system failures, decoupling updates on

different tables (i.e., relaxing consistency), and item potent operations that can

be applied multiple times with the same result. In this sense, BASE describes

more a spectrum of architectural styles than a single model. The eventual

state of consistency can be provided as a result of a read repair, where any

outdated data are refreshed with the latest version of the data as a result of

the system detecting stale data during a read operation. Another approach is

that of weak consistency. In this case, the read operation will return the first

value found, not checking for staleness. Any stale nodes discovered are sim-

ply marked for updating at some stage in the future. This is a performance-

focused approach but has the associated risk that data retrieved may not

be the most current. In the following sections, we will discuss several tech-

niques for implementing services following the BASE principle.

Conventional storage techniques may not be adequate for big data and,

hence, the cloud applications. To scale storage systems to cloud scale, the

basic technique is to partition and replicate the data over multiple inde-

pendent storage systems. The word independent is emphasized, since it is

well-known that databases can be partitioned into mutually dependent sub-

databases that are automatically synchronized for reasons of performance

and availability. Partitioning and replication increases the overall through-

put of the system, since the total throughput of the combined system is the

aggregate of the individual storage systems. To scale both the throughput

and the maximum size of the data that can be stored beyond the limits of tra-

ditional database deployments, it is possible to partition the data, and store

each partition in its own database. For scaling the throughput only, it is pos-

sible to use replication. Partitioning and replication also increase the storage

capacity of a storage system by reducing the amount of data that needs to be

stored in each partition. However, this creates synchronization and consis-

tency problems, and discussion of this aspect is out of scope for this topic.

The other technology for scaling storage described in this section is

known by the name Not only SQL (NoSQL). NoSQL was developed as a

reaction to the perception that conventional databases, focused on the need

to ensure data integrity for enterprise applications, were too rigid to scale

Guide to Cloud Computing for Business and Technology Managers

Search WWH ::

Custom Search

Home