Database Reference
In-Depth Information
as quickly as Redis, but the ease of scalability and ability to run more complex query
capabilities might make it more useful for some applications. Cassandra is also well
integrated with Apache Hadoop, and can be used as both an input and output for
Hadoop MapReduce jobs.
Finally, it's not impossible to distribute traditional relational databases as data sizes
get larger and larger; it's just that it can be more difficult. Some of the techniques for
sharding as described earlier can be applied to relational databases. Furthermore, a
variety of open-source and commercial relational database sharding software is avail-
able. However, as we will see in the next section, the future might belong to new
types of database designs that accept a small set of trade-offs to attempt to provide the
best of both worlds: a blend of nonrelational scalability and relational consistency.
NewSQL: The Return of Codd
Database administrators often prefer using SQL because it abstracts away much of the
complexity of building queries and a huge number of users are already familiar with
it. A collection of new database designs attempt to bring together features found in
distributed nonrelational databases with the consistency guarantees afforded by rela-
tional designs. Some of these projects are (perhaps lamentably) known as “NewSQL”
database designs. As with Redis, improving performance by placing the database in
memory is another common pattern being used by next-generation systems.
VoltDB was created in part by Michael Stonebraker, who was instrumental in the
creation of popular open-source relational database PostgreSQL and the commercial
analytical database Vertica. VoltDB is a relational, ACID-compliant database that
shares some of the same performance boosts as Redis. Volt uses an in-memory data
model along with snapshots of the data to create persistence. Although the need for
the entire dataset to fit in memory creates the same sort of size limitations that Redis
faces, VoltDB is also designed to scale easily by simply adding more servers to the
system.
MemSQL is another in-memory database that is growing in popularity; it uses a
variety of best practices to be performant and can be queried using standard SQL. One
interesting approach that MemSQL takes to boost query performance is to dynami-
cally compile SQL statements into C++ code. MemSQL then runs this generated
C++ as shared libraries. MemSQL also automatically scales linearly as more machines
are added to a cluster pool.
Google's Spanner is a database that has evolved from the company's requirements
for global computing. According to the public research paper about Spanner, the soft-
ware “provides externally consistent reads and writes, and globally-consistent reads
across the database at a timestamp” and “looks like a relational database instead of a
key-value store.” 6 Spanner is designed to scale across multiple data centers and exhibit
a high level of consistency at the expense of some latency.
6. http://research.google.com/archive/spanner-osdi2012.pdf
 
 
Search WWH ::




Custom Search