Database Reference
In-Depth Information
3. Data Modeling
When creating a data model for your keyspace, the most important thing to do is to
forget everything you know about relational data modeling. Relational data mod-
els are designed for efficient storage, relational lookups, and associations between
concerns. The Cassandra data model is designed for raw performance and storage
of vast amounts of data.
Unlike relational databases, the data model for Cassandra is based on the query
patterns required. This means that you have to know the read/write patterns before
you create your data model. This also applies to indexes. Indexes in Cassandra are
a requirement for specific types of queries, unlike a relational database where in-
dexes are a performance-tuning device.
In this chapter, we will highlight some key differences between creating a rela-
tional model and a Cassandra model. We will then dive into an example data mod-
el for storing time-series data.
The Cassandra Data Model
To understand how to model in Cassandra, you must first understand how the Cas-
sandra data model works. Cassandra gets its data distribution from the Dynamo
whitepaper by Amazon and its data representation from the BigTable whitepaper
by Google.
When creating a table using CQL, you are not only telling Cassandra what the
name and type of data are, you are also telling it how to store and distribute your
data. This is done via the PRIMARY KEY operator. The PRIMARY KEY tells the
Cassandra storage system to distribute the data based on the value of this key; this
is known as a partition key. When there are multiple fields in the PRIMARY KEY ,
as is the case with compound keys, the first field is the partition key (how the data
is distributed) and the subsequent fields are known as the clustering keys (how the
data is stored on disk). Clustering keys allow you to pregroup your data by the
values in the keys. Using compound keys in Cassandra is commonly referred to
as “wide rows.” “Wide rows” refers to the rows that Cassandra is storing on disk,
rather than the rows that are represented to you when you make a query.
Figure 3.1 shows how the data in Listing 3.1 might be stored in a five-node
cluster.
Search WWH ::




Custom Search