Database Reference
In-Depth Information
you would lose the integrity of the Invoice document as it was on the invoice date, which could
violate audits, reports, or laws, and cause other problems.
In the relational world, denormalization violates Codd's normal forms, and we try to avoid it.
But in Cassandra, denormalization is, well, perfectly normal. It's not required if your data model
is simple. But don't be afraid of it.
The important point is that instead of modeling the data first and then writing queries, with Cas-
sandra you model the queries and let the data be organized around them. Think of the most com-
mon query paths your application will use, and then create the column families that you need to
support them.
Detractors have suggested that this is a problem. But it is perfectly reasonable to expect that you
should think hard about the queries in your application, just as you would, presumably, think
hard about your relational domain. You may get it wrong, and then you'll have problems in either
world. Or your query needs might change over time, and then you'll have to work to update your
data set. But this is no different from defining the wrong tables, or needing additional tables, in
RDBMS.
NOTE
For an interesting article on how Cloudkick is using Cassandra to store metrics and monitoring data, see
https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra .
Design Patterns
There are a few ways that people commonly use Cassandra that might be described as design
patterns. I've given names to these common patterns: Materialized View, Valueless Column, and
Aggregate Key.
Materialized View
It is common to create a secondary index that represents additional queries. Because you don't
have a SQL WHERE clause, you can recreate this effect by writing your data to a second column
family that is created specifically to represent that query.
For example, if you have a User column family and you want to find users in a particular city,
you might create a second column family called UserCity that stores user data with the city
as keys (instead of the username) and that has columns named for the users who live in that
city. This is a denormalization technique that will speed queries and is an example of specifically
designing your data around your queries (and not the other way around). This usage is common
Search WWH ::




Custom Search