Building a NoSQL-Based Web App to Collect Crowd-Sourced Data - Data Just Right: Introduction to Large-Scale Data and Analytics

Database Reference

In-Depth Information

as quickly as Redis, but the ease of scalability and ability to run more complex query

capabilities might make it more useful for some applications. Cassandra is also well

integrated with Apache Hadoop, and can be used as both an input and output for

Hadoop MapReduce jobs.

Finally, it's not impossible to distribute traditional relational databases as data sizes

get larger and larger; it's just that it can be more difficult. Some of the techniques for

sharding as described earlier can be applied to relational databases. Furthermore, a

variety of open-source and commercial relational database sharding software is avail-

able. However, as we will see in the next section, the future might belong to new

types of database designs that accept a small set of trade-offs to attempt to provide the

best of both worlds: a blend of nonrelational scalability and relational consistency.

Database administrators often prefer using SQL because it abstracts away much of the

complexity of building queries and a huge number of users are already familiar with

it. A collection of new database designs attempt to bring together features found in

distributed nonrelational databases with the consistency guarantees afforded by rela-

tional designs. Some of these projects are (perhaps lamentably) known as “NewSQL”

database designs. As with Redis, improving performance by placing the database in

memory is another common pattern being used by next-generation systems.

VoltDB was created in part by Michael Stonebraker, who was instrumental in the

creation of popular open-source relational database PostgreSQL and the commercial

analytical database Vertica. VoltDB is a relational, ACID-compliant database that

shares some of the same performance boosts as Redis. Volt uses an in-memory data

model along with snapshots of the data to create persistence. Although the need for

the entire dataset to fit in memory creates the same sort of size limitations that Redis

faces, VoltDB is also designed to scale easily by simply adding more servers to the

system.

MemSQL is another in-memory database that is growing in popularity; it uses a

variety of best practices to be performant and can be queried using standard SQL. One

interesting approach that MemSQL takes to boost query performance is to dynami-

cally compile SQL statements into C++ code. MemSQL then runs this generated

C++ as shared libraries. MemSQL also automatically scales linearly as more machines

are added to a cluster pool.

Google's Spanner is a database that has evolved from the company's requirements

for global computing. According to the public research paper about Spanner, the soft-

ware “provides externally consistent reads and writes, and globally-consistent reads

across the database at a timestamp” and “looks like a relational database instead of a

key-value store.” 6 Spanner is designed to scale across multiple data centers and exhibit

a high level of consistency at the expense of some latency.

Search WWH ::

Custom Search

Home