Introducing Cassandra - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

is columnar or column-oriented, it might be more helpful to think of it as an indexed, row-ori-

ented store, as we examine more thoroughly in Chapter 3 . I list the data orientation as a feature,

because there are several data models that are easy to visualize and use in a nonrelational model;

it's a weird mixture of laziness and possibly inviting far more work than necessary to just assume

that the relational model is always best, regardless of your application.

Cassandra stores data in what can be thought of for now as a multidimensional hash table. That

means you don't have to decide ahead of time precisely what your data structure must look like,

or what fields your records will need. This can be useful if you're in startup mode and are adding

or changing features with some frequency. It is also attractive if you need to support an Agile

development methodology and aren't free to take months for up-front analysis. If your business

changes and you later need to add or remove new fields on the fly without disrupting service, go

ahead; Cassandra lets you.

That's not to say that you don't have to think about your data, though. On the contrary, Cassandra

requires a shift in how you think about it. Instead of designing a pristine data model and then

designing queries around the model as in RDBMS, you are free to think of your queries first, and

then provide the data that answers them.

Schema-Free

Cassandra requires you to define an outer container, called a keyspace, that contains column fam-

ilies. The keyspace is essentially just a logical namespace to hold column families and certain

configuration properties. The column families are names for associated data and a sort order.

Beyond that, the data tables are sparse, so you can just start adding data to it, using the columns

that you want; there's no need to define your columns ahead of time. Instead of modeling data up

front using expensive data modeling tools and then writing queries with complex join statements,

Cassandra asks you to model the queries you want, and then provide the data around them.

High Performance

Cassandra was designed specifically from the ground up to take full advantage of multiprocessor/

multicore machines, and to run across many dozens of these machines housed in multiple data

centers. It scales consistently and seamlessly to hundreds of terabytes. Cassandra has been shown

to perform exceptionally well under heavy load. It consistently can show very fast throughput for

writes per second on a basic commodity workstation. As you add more servers, you can main-

tain all of Cassandra's desirable properties without sacrificing performance.

Search WWH ::

Custom Search

Home