The Cassandra Data Model - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

On the server side, columns are immutable in order to prevent multithreading issues. The column

is defined in Cassandra by the org.apache.cassandra.db.IColumn interface, which allows a

variety of operations, including getting the value of the column as a byte array or, in the case of

a super column, getting its subcolumns as a Collection<IColumn> and finding the time of the

most recent change.

In a relational database, rows are stored together. This wasn't the case for early versions of Cas-

sandra, but as of version 0.6, rows for the same column family are stored together on disk.

NOTE

You cannot perform joins in Cassandra. If you have designed a data model and find that you need

something like a join, you'll have to either do the work on the client side, or create a denormalized

second column family that represents the join results for you. This is common among Cassandra users.

Performing joins on the client should be a very rare case; you really want to duplicate (denormalize) the

data instead.

Wide Rows, Skinny Rows

When designing a table in a traditional relational database, you're typically dealing with “entit-

ies,” or the set of attributes that describe a particular noun (Hotel, User, Product, etc.). Not much

thought is given to the size of the rows themselves, because row size isn't negotiable once you've

decided what noun your table represents. However, when you're working with Cassandra, you

actually have a decision to make about the size of your rows: they can be wide or skinny, de-

pending on the number of columns the row contains.

A wide row means a row that has lots and lots (perhaps tens of thousands or even millions) of

columns. Typically there is a small number of rows that go along with so many columns. Con-

versely, you could have something closer to a relational model, where you define a smaller num-

ber of columns and use many different rows—that's the skinny model.

Wide rows typically contain automatically generated names (like UUIDs or timestamps) and are

used to store lists of things. Consider a monitoring application as an example: you might have a

row that represents a time slice of an hour by using a modified timestamp as a row key, and then

store columns representing IP addresses that accessed your application within that interval. You

can then create a new row key after an hour elapses.

Skinny rows are slightly more like traditional RDBMS rows, in that each row will contain similar

sets of column names. They differ from RDBMS rows, however, because all columns are essen-

tially optional.

Search WWH ::

Custom Search

Home