Data Modeling - Practical Cassandra

Database Reference

In-Depth Information

3. Data Modeling

When creating a data model for your keyspace, the most important thing to do is to

forget everything you know about relational data modeling. Relational data mod-

els are designed for efficient storage, relational lookups, and associations between

concerns. The Cassandra data model is designed for raw performance and storage

of vast amounts of data.

Unlike relational databases, the data model for Cassandra is based on the query

patterns required. This means that you have to know the read/write patterns before

you create your data model. This also applies to indexes. Indexes in Cassandra are

a requirement for specific types of queries, unlike a relational database where in-

dexes are a performance-tuning device.

In this chapter, we will highlight some key differences between creating a rela-

tional model and a Cassandra model. We will then dive into an example data mod-

el for storing time-series data.

The Cassandra Data Model

To understand how to model in Cassandra, you must first understand how the Cas-

sandra data model works. Cassandra gets its data distribution from the Dynamo

whitepaper by Amazon and its data representation from the BigTable whitepaper

by Google.

When creating a table using CQL, you are not only telling Cassandra what the

name and type of data are, you are also telling it how to store and distribute your

data. This is done via the PRIMARY KEY operator. The PRIMARY KEY tells the

Cassandra storage system to distribute the data based on the value of this key; this

is known as a partition key. When there are multiple fields in the PRIMARY KEY ,

as is the case with compound keys, the first field is the partition key (how the data

is distributed) and the subsequent fields are known as the clustering keys (how the

data is stored on disk). Clustering keys allow you to pregroup your data by the

values in the keys. Using compound keys in Cassandra is commonly referred to

as “wide rows.” “Wide rows” refers to the rows that Cassandra is storing on disk,

rather than the rows that are represented to you when you make a query.

Figure 3.1 shows how the data in Listing 3.1 might be stored in a five-node

cluster.

Search WWH ::

Custom Search

Home