Building a NoSQL-Based Web App to Collect Crowd-Sourced Data - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

Summary

The most common architecture for storing data, the relational database model, was a

result of the work of database pioneers such as Edgar Codd. The relational database

model is designed to provide consistency, a f lexible query model, and predictability.

Many Web and mobile applications must handle both a constant barrage of incoming

data and the need to scale up predictably as the amount of users and clients grows. As

data sizes get larger and the need for databases to be tolerant of faults increases, the

effort needed to scale and replicate relational database systems tends to make them

impractical for high-throughput applications with huge data volumes. A common solu-

tion to deal with the problem of massive amounts of data is to use alternative archi-

tectures that eschew the traditional architectural choices of relational databases. These

are often referred to broadly as NoSQL technologies. Two of the most popular non-

relational databases are key-value stores and document stores. Key-value data stores

allow each record in a database to be accessed by a single key. The data does not need

to match a pre-existing schema. This architecture allows for very fast performance, but

key-value stores lack the ability to query data by value. In contrast, document stores

provide the ability to query against the document itself. Document stores are excellent

choices when the data retrieved is best used in single-document form (such as Web site

content) or when your database schema is very fluid.

Even in the crowded world of open-source nonrelational data stores, various solu-

tions are designed to excel for one particular use case or another. Some database tech-

nologies are designed to be performant under heavy load, at the expense of consistency

across nodes. Others types of databases specialize in being as easy to scale across a clus-

ter of machines or as flexible with schema changes as possible. For small- or medium-

sized applications that require a strong guarantee of consistency and a f lexible querying

model, relational databases are still the best choice.

For applications that require high throughput for database writes, a great choice

is to use a key-value data store. Much like a hash table, key-value architecture stores

data as a collection of unique key-value pairs, resulting in very quick data storage and

retrieval. This speed comes at the expense of being able to query data by value. Unlike

a document store, only the value of the key can be used to access data. The most

popular open-source technology that uses this approach is Redis, which combines

an in-memory key-value system with automatic snapshots to disk. Fault tolerance

can be provided to some degree by configuring snapshots of data to a persistent disk.

Although the ability to completely hold a dataset in memory is both a source of speed

and a potential liability, Redis can be used in a distributed manner using client-side

sharding. Twemproxy, which provides a hashing proxy layer that automatically dis-

tributes keys to a pool of Redis instances, is currently the best way to shard a database

across a pool of separate Redis instances.

The distributed database space is still evolving rapidly. A number of new software

solutions are combining the structured compliance of Edgar Codd's relational model

with the potential for scalability found in key-value and document databases.

Search WWH ::

Custom Search

Home