Quick Start - Mastering Apache Cassandra

Database Reference

In-Depth Information

Modeling data

In the RDBMS world, you would glance over the entities and think about relations while

modeling the application. Then, you will join tables to get the required data. There is no

join option in Cassandra, so we will have to denormalize things. Looking at the previously

mentioned specifications, we can say that:

• We need a blogs table to store the blog name and other global information, such as

the blogger's username and password

• We will have to pull posts for the blog, ideally, sorted in reverse chronological or-

der

• We will also have to pull all the comments for each post, when we see the post

page

• We will have to maintain tags in such a way that tags can be used to pull all the

posts with the same tag

• We will also have to have counters for the upvotes and downvotes for posts and

comments

With the preceding details, let's see the tables we need:

• blogs : This table will hold global blog metadata and user information, such as

blog name, username, password, and other metadata.

• posts : This table will hold individual posts. At first glance, posts seems to be

an ordinary table with primary keys as post ID and a reference to the blog that it

belongs to. The problem arises when we add the requirement of being able to be

sorted by timestamp. Unlike RDBMS, you cannot just perform an ORDER BY op-

eration across partitions. The work-around for this is to use a composite key. A

composite key consists of a partition key and one or more column(s) that determ-

ines where the other columns are going to be stored. Also, the other columns in the

composite key determine relative ordering for the set of columns that are being in-

serted as a row with the key.

Remember that a partition is completely stored on a node. The benefit of this is

that the fetches are faster, but at the same time a partition is limited by the total

number of cells that it can hold, which is 2 billion cells. The other downside of

having everything on one partition may cause lots of requests to go to only a

couple of nodes (replicas), making them a hotspot in the cluster, which is not good.

You can avoid this by using some sort of bucketing such as involving months and

years in the partition key. This will make sure that the partition changes every

Search WWH ::

Custom Search

Home