Database Reference
In-Depth Information
The Cassandra data model
From Version 1.2 onwards, Cassandra has CQL as its primary way to access and alter the
database. CQL is an abstraction layer that makes you feel like you are working with
RDBMS, but the underlying data model does not support all the features that a traditional
database or SQL provides. There is no group by, no relational integrity, foreign key con-
straints, and no join. There is some support for order, distinct, and triggers. There are things
such as time to live ( TTL ) and write time functions. So Cassandra, like most of the
NoSQL databases, is generally less featured compared to the number of features traditional
databases provide.
Cassandra is designed for extremely high-read and high-write speed and horizontal scalab-
ility. Without some of the analytical features of traditional systems, developers need to
work around Cassandra's shortcomings by planning ahead. In the Cassandra community, it
is generally referred to as modeling the database based on what queries you will run in fu-
ture. Let's take an example. If you have a people database and you wanted to draw a bar
chart that shows the number of people from different cities, in Cassandra, you cannot just
run the select count(*), city from people group by city statement.
Instead, you will have to create a different table that has city as its primary key and a
counter column that holds the number of records of persons. Every time a people record is
added or removed, you increase or decrease the counter for the specific city. Understanding
underlying the data structure can help you rationalize why Cassandra can or cannot do
some things.
If you remove all the complexity, the data in Cassandra is stored in a nested hash map- a
hash map containing another hash map. Realistically speaking, it is a distributed, nested,
sorted hash map where the outer sorted hash map is distributed across the machines and the
inner one stays on one machine. The following figure shows the Cassandra data model:
Cassandra has two ways of viewing its data: one is viewing data as maps within a map, the
other is viewing it as a table. The former is the old way, and more closer to actually how
the data is stored; and the latter is the new way, the way CQL represents the data. In this
Search WWH ::




Custom Search