Information Technology Reference
In-Depth Information
hungry joins, are not the right tool for the task they have before them: quickly
finding relevant data from terabytes of unstructured data (Web content)
that may be stored across thousands of geographically desperate nodes. In
other words, relational model does not scale well for this type of data. Thus,
techniques for guaranteeing strong consistency in large distributed systems
limit scalability and results in latency issues. To cope with these problems,
BASE was proposed as an alternative to ACID.
21.3.2.2 BASE (Basically Available, Soft State, Eventual Consistency)
BASE follows an optimistic approach accepting stale data and approximate
answers while favoring availability. Some ways to achieve this are by sup-
porting partial failures without total system failures, decoupling updates on
different tables (i.e., relaxing consistency), and item potent operations that can
be applied multiple times with the same result. In this sense, BASE describes
more a spectrum of architectural styles than a single model. The eventual
state of consistency can be provided as a result of a read repair, where any
outdated data are refreshed with the latest version of the data as a result of
the system detecting stale data during a read operation. Another approach is
that of weak consistency. In this case, the read operation will return the first
value found, not checking for staleness. Any stale nodes discovered are sim-
ply marked for updating at some stage in the future. This is a performance-
focused approach but has the associated risk that data retrieved may not
be the most current. In the following sections, we will discuss several tech-
niques for implementing services following the BASE principle.
Conventional storage techniques may not be adequate for big data and,
hence, the cloud applications. To scale storage systems to cloud scale, the
basic technique is to partition and replicate the data over multiple inde-
pendent storage systems. The word independent is emphasized, since it is
well-known that databases can be partitioned into mutually dependent sub-
databases that are automatically synchronized for reasons of performance
and availability. Partitioning and replication increases the overall through-
put of the system, since the total throughput of the combined system is the
aggregate of the individual storage systems. To scale both the throughput
and the maximum size of the data that can be stored beyond the limits of tra-
ditional database deployments, it is possible to partition the data, and store
each partition in its own database. For scaling the throughput only, it is pos-
sible to use replication. Partitioning and replication also increase the storage
capacity of a storage system by reducing the amount of data that needs to be
stored in each partition. However, this creates synchronization and consis-
tency problems, and discussion of this aspect is out of scope for this topic.
The other technology for scaling storage described in this section is
known by the name Not only SQL (NoSQL). NoSQL was developed as a
reaction to the perception that conventional databases, focused on the need
to ensure data integrity for enterprise applications, were too rigid to scale
Search WWH ::




Custom Search