The Cassandra Data Model - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

NOTE

Support for secondary indexes is currently being added to Cassandra 0.7. This allows you to create in-

dexes on column values. So, if you want to see all the users who live in a given city, for example, sec-

ondary index support will save you from doing it from scratch.

Sorting Is a Design Decision

In RDBMS, you can easily change the order in which records are returned to you by using ORDER

BY in your query. The default sort order is not configurable; by default, records are returned in

the order in which they are written. If you want to change the order, you just modify your query,

and you can sort by any list of columns. In Cassandra, however, sorting is treated differently; it

is a design decision. Column family definitions include a CompareWith element, which dictates

the order in which your rows will be sorted on reads, but this is not configurable per query.

Where RDBMS constrains you to sorting based on the data type stored in the column, Cassandra

only stores byte arrays, so that approach doesn't make sense. What you can do, however, is sort

as if the column were one of several different types ( ASCII , Long integer , TimestampUUID ,

lexicographically, etc.). You can also use your own pluggable comparator for sorting if you wish.

Otherwise, there is no support for ORDER BY and GROUP BY statements in Cassandra as there is

in SQL. There is a query type called a SliceRange , which we examine in Chapter 4 ; it is similar

to ORDER BY in that it allows a reversal.

Denormalization

In relational database design, we are often taught the importance of normalization. This is not an

advantage when working with Cassandra because it performs best when the data model is denor-

malized. It is often the case that companies end up denormalizing data in a relational database.

There are two common reasons for this. One is performance. Companies simply can't get the

performance they need when they have to do so many joins on years' worth of data, so they de-

normalize along the lines of known queries. This ends up working, but goes against the grain of

how relational databases are intended to be designed, and ultimately makes one question wheth-

er using a relational database is the best approach in these circumstances.

A second reason that relational databases get denormalized on purpose is a business document

structure that requires retention. That is, you have an enclosing table that refers to a lot of extern-

al tables whose data could change over time, but you need to preserve the enclosing document as

a snapshot in history. The common example here is with invoices. You already have Customer

and Product tables, and you'd think that you could just make an invoice that refers to those tables.

But this should never be done in practice. Customer or price information could change, and then

Search WWH ::

Custom Search

Home