Database Reference
In-Depth Information
is columnar or column-oriented, it might be more helpful to think of it as an indexed, row-ori-
ented store, as we examine more thoroughly in Chapter 3 . I list the data orientation as a feature,
because there are several data models that are easy to visualize and use in a nonrelational model;
it's a weird mixture of laziness and possibly inviting far more work than necessary to just assume
that the relational model is always best, regardless of your application.
Cassandra stores data in what can be thought of for now as a multidimensional hash table. That
means you don't have to decide ahead of time precisely what your data structure must look like,
or what fields your records will need. This can be useful if you're in startup mode and are adding
or changing features with some frequency. It is also attractive if you need to support an Agile
development methodology and aren't free to take months for up-front analysis. If your business
changes and you later need to add or remove new fields on the fly without disrupting service, go
ahead; Cassandra lets you.
That's not to say that you don't have to think about your data, though. On the contrary, Cassandra
requires a shift in how you think about it. Instead of designing a pristine data model and then
designing queries around the model as in RDBMS, you are free to think of your queries first, and
then provide the data that answers them.
Schema-Free
Cassandra requires you to define an outer container, called a keyspace, that contains column fam-
ilies. The keyspace is essentially just a logical namespace to hold column families and certain
configuration properties. The column families are names for associated data and a sort order.
Beyond that, the data tables are sparse, so you can just start adding data to it, using the columns
that you want; there's no need to define your columns ahead of time. Instead of modeling data up
front using expensive data modeling tools and then writing queries with complex join statements,
Cassandra asks you to model the queries you want, and then provide the data around them.
High Performance
Cassandra was designed specifically from the ground up to take full advantage of multiprocessor/
multicore machines, and to run across many dozens of these machines housed in multiple data
centers. It scales consistently and seamlessly to hundreds of terabytes. Cassandra has been shown
to perform exceptionally well under heavy load. It consistently can show very fast throughput for
writes per second on a basic commodity workstation. As you add more servers, you can main-
tain all of Cassandra's desirable properties without sacrificing performance.
Search WWH ::




Custom Search