Building a NoSQL-Based Web App to Collect Crowd-Sourced Data - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

machine block existing connections for a short time, affecting system availability, in

order to make sure data is consistent with the other machine? Dealing with data at

scale using multiple machines presents challenges like this all the time.

The consequences of the CAP theorem freed database designers to come up with

alternative ways of thinking about scalability. In a clever play on words, an alterna-

tive to the rules of ACID compliance came to be known as BASE , short for “basically

available, soft state, eventually consistent.” In 2008, Dan Pritchett, then a technical

fellow at eBay, published an ACM article entitled “BASE: An Acid Alternative,” 1 in

which he stated that while “ACID is pessimistic and forces consistency at the end of

every operation, BASE is optimistic and accepts that the database consistency will be

in a state of f lux.”

Put simply, BASE systems strive to maximize some aspect of CAP, such as avail-

ability or partition tolerance, by allowing parts of the database system to exist in dif-

ferent states of consistency. For example, when using many machines, a database write

event to one machine might not be available to the entire system. The state of this

type of system will become consistent—eventually. In an ACID-compliant relational

database, we would expect data from an insertion to be available to the system and all

users immediately. However, in our distributed system, while a database insertion is

happening on one machine a client could be requesting data stored on another. If two

clients request the same piece of data from separate machines, the two values retrieved

may be out of sync. Developers that follow the guidelines of BASE architecture accept

that these inconsistencies tend to be less important when compared to the need of the

system to be able to scale well.

Why can't a relational database model be distributed across many machines? The

answer is simple: It can. However, as we've seen, the process of distributing a relational

database over a large pool of machines exposes a great deal of complexity for the data-

base administrator.

Relational databases are the best design for applications in which data is absolutely

required to be inserted in a consistent state. The canonical example of this is a finan-

cial transaction database—users will not tolerate a database that provides inconsistent

responses about money. For some types of Web content, applications are mostly geared

toward a specific task, such as serving many users with content that doesn't change

very much. Sometimes, an application is focused on being able to collect data very

quickly and using many machines, and immediate consistency of the data is not a

requirement. For these applications, a different architecture may be more useful.

Let's take a look at the use cases, advantages, and disadvantages of nonrelational data-

bases. There are many types of nonrelational database models, but we will focus on

the two most popular types: key-value stores and document stores. These classes of

Search WWH ::

Custom Search

Home