Quick Start - Mastering Apache Cassandra

Database Reference

In-Depth Information

One obvious benefit of having such a flexible data storage mechanism is that you can

have arbitrary number of cells with customized names and have a partition key store data

as a list of tuples (a tuple is an ordered set; in this case, the tuple is a key-value pair). This

comes handy when you have to store things such as time series, for example, if you want

to use Cassandra to store your Facebook timeline or your Twitter feed or you want the

partition key to be a sensor ID and each cell to represent a tuple with name as the

timestamp when the data was created and value as the data sent by the sensor. Also, in a

partition, cells are by default naturally ordered by the cell's name. So, in our sensor case,

you will get data sorted for free. The other difference is, unlike RDBMS, Cassandra does

not have relations. This means relational logic will be needed to be handled at the applica-

tion level. This also means that we may want to denormalize the database because there is

no join and to avoid looking up multiple tables by running multiple queries. Denormaliza-

tion is a process of adding redundancy in data to achieve high read performance. For more

information, visit http://en.wikipedia.org/wiki/Denormalization .

Partitions are distributed across the cluster, creating effective auto-sharding. Each server

holds a range(s) of keys. So, if balanced, a cluster with more nodes will have less rows per

node. All these concepts will be repeated in detail in the later chapters.

Note

Types of keys

In the context of Cassandra, you may find the concept of keys a bit confusing. There are

five terms that you may encounter. Here is what they generally mean:

• Primary key : This is the column or a group of columns that uniquely defines a

row of the CQL table.

• Composite key : This is a type of primary key that is made up of more than one

column. Sometimes, the composite key is also referred to as the compound key.

• Partition key : Cassandra's internal data representation is large rows with a

unique key called row key. It uses these row key values to distribute data across

cluster nodes. Since these row keys are used to partition data, they as called parti-

tion keys. When you define a table with a simple key, that key is the partition key.

If you define a table with a composite key, the first term of that composite key

works as the partition key. This means all the CQL rows with the same partition

key lives on one machine.

• Clustering key : This is the column that tells Cassandra how the data within a

partition is ordered (or clustered). This essentially provides presorted retrieval if

you know what order you want your data to be retrieve in.

Search WWH ::

Custom Search

Home