Database Reference
In-Depth Information
NOTE
Support for secondary indexes is currently being added to Cassandra 0.7. This allows you to create in-
dexes on column values. So, if you want to see all the users who live in a given city, for example, sec-
ondary index support will save you from doing it from scratch.
Sorting Is a Design Decision
In RDBMS, you can easily change the order in which records are returned to you by using ORDER
BY in your query. The default sort order is not configurable; by default, records are returned in
the order in which they are written. If you want to change the order, you just modify your query,
and you can sort by any list of columns. In Cassandra, however, sorting is treated differently; it
is a design decision. Column family definitions include a CompareWith element, which dictates
the order in which your rows will be sorted on reads, but this is not configurable per query.
Where RDBMS constrains you to sorting based on the data type stored in the column, Cassandra
only stores byte arrays, so that approach doesn't make sense. What you can do, however, is sort
as if the column were one of several different types ( ASCII , Long integer , TimestampUUID ,
lexicographically, etc.). You can also use your own pluggable comparator for sorting if you wish.
Otherwise, there is no support for ORDER BY and GROUP BY statements in Cassandra as there is
in SQL. There is a query type called a SliceRange , which we examine in Chapter 4 ; it is similar
to ORDER BY in that it allows a reversal.
Denormalization
In relational database design, we are often taught the importance of normalization. This is not an
advantage when working with Cassandra because it performs best when the data model is denor-
malized. It is often the case that companies end up denormalizing data in a relational database.
There are two common reasons for this. One is performance. Companies simply can't get the
performance they need when they have to do so many joins on years' worth of data, so they de-
normalize along the lines of known queries. This ends up working, but goes against the grain of
how relational databases are intended to be designed, and ultimately makes one question wheth-
er using a relational database is the best approach in these circumstances.
A second reason that relational databases get denormalized on purpose is a business document
structure that requires retention. That is, you have an enclosing table that refers to a lot of extern-
al tables whose data could change over time, but you need to preserve the enclosing document as
a snapshot in history. The common example here is with invoices. You already have Customer
and Product tables, and you'd think that you could just make an invoice that refers to those tables.
But this should never be done in practice. Customer or price information could change, and then
Search WWH ::




Custom Search