Effective CQL - Mastering Apache Cassandra

Database Reference

In-Depth Information

The Cassandra data model

From Version 1.2 onwards, Cassandra has CQL as its primary way to access and alter the

database. CQL is an abstraction layer that makes you feel like you are working with

RDBMS, but the underlying data model does not support all the features that a traditional

database or SQL provides. There is no group by, no relational integrity, foreign key con-

straints, and no join. There is some support for order, distinct, and triggers. There are things

such as time to live ( TTL ) and write time functions. So Cassandra, like most of the

NoSQL databases, is generally less featured compared to the number of features traditional

databases provide.

Cassandra is designed for extremely high-read and high-write speed and horizontal scalab-

ility. Without some of the analytical features of traditional systems, developers need to

work around Cassandra's shortcomings by planning ahead. In the Cassandra community, it

is generally referred to as modeling the database based on what queries you will run in fu-

ture. Let's take an example. If you have a people database and you wanted to draw a bar

chart that shows the number of people from different cities, in Cassandra, you cannot just

run the select count(*), city from people group by city statement.

Instead, you will have to create a different table that has city as its primary key and a

counter column that holds the number of records of persons. Every time a people record is

added or removed, you increase or decrease the counter for the specific city. Understanding

underlying the data structure can help you rationalize why Cassandra can or cannot do

some things.

If you remove all the complexity, the data in Cassandra is stored in a nested hash map- a

hash map containing another hash map. Realistically speaking, it is a distributed, nested,

sorted hash map where the outer sorted hash map is distributed across the machines and the

inner one stays on one machine. The following figure shows the Cassandra data model:

Cassandra has two ways of viewing its data: one is viewing data as maps within a map, the

other is viewing it as a table. The former is the old way, and more closer to actually how

the data is stored; and the latter is the new way, the way CQL represents the data. In this

Search WWH ::

Custom Search

Home