Database Reference
In-Depth Information
On the server side, columns are immutable in order to prevent multithreading issues. The column
is defined in Cassandra by the org.apache.cassandra.db.IColumn interface, which allows a
variety of operations, including getting the value of the column as a byte array or, in the case of
a super column, getting its subcolumns as a Collection<IColumn> and finding the time of the
most recent change.
In a relational database, rows are stored together. This wasn't the case for early versions of Cas-
sandra, but as of version 0.6, rows for the same column family are stored together on disk.
NOTE
You cannot perform joins in Cassandra. If you have designed a data model and find that you need
something like a join, you'll have to either do the work on the client side, or create a denormalized
second column family that represents the join results for you. This is common among Cassandra users.
Performing joins on the client should be a very rare case; you really want to duplicate (denormalize) the
data instead.
Wide Rows, Skinny Rows
When designing a table in a traditional relational database, you're typically dealing with “entit-
ies,” or the set of attributes that describe a particular noun (Hotel, User, Product, etc.). Not much
thought is given to the size of the rows themselves, because row size isn't negotiable once you've
decided what noun your table represents. However, when you're working with Cassandra, you
actually have a decision to make about the size of your rows: they can be wide or skinny, de-
pending on the number of columns the row contains.
A wide row means a row that has lots and lots (perhaps tens of thousands or even millions) of
columns. Typically there is a small number of rows that go along with so many columns. Con-
versely, you could have something closer to a relational model, where you define a smaller num-
ber of columns and use many different rows—that's the skinny model.
Wide rows typically contain automatically generated names (like UUIDs or timestamps) and are
used to store lists of things. Consider a monitoring application as an example: you might have a
row that represents a time slice of an hour by using a modified timestamp as a row key, and then
store columns representing IP addresses that accessed your application within that interval. You
can then create a new row key after an hour elapses.
Skinny rows are slightly more like traditional RDBMS rows, in that each row will contain similar
sets of column names. They differ from RDBMS rows, however, because all columns are essen-
tially optional.
Search WWH ::




Custom Search