Database Reference
In-Depth Information
Modeling data
In the RDBMS world, you would glance over the entities and think about relations while
modeling the application. Then, you will join tables to get the required data. There is no
join option in Cassandra, so we will have to denormalize things. Looking at the previously
mentioned specifications, we can say that:
• We need a blogs table to store the blog name and other global information, such as
the blogger's username and password
• We will have to pull posts for the blog, ideally, sorted in reverse chronological or-
der
• We will also have to pull all the comments for each post, when we see the post
page
• We will have to maintain tags in such a way that tags can be used to pull all the
posts with the same tag
• We will also have to have counters for the upvotes and downvotes for posts and
comments
With the preceding details, let's see the tables we need:
blogs : This table will hold global blog metadata and user information, such as
blog name, username, password, and other metadata.
posts : This table will hold individual posts. At first glance, posts seems to be
an ordinary table with primary keys as post ID and a reference to the blog that it
belongs to. The problem arises when we add the requirement of being able to be
sorted by timestamp. Unlike RDBMS, you cannot just perform an ORDER BY op-
eration across partitions. The work-around for this is to use a composite key. A
composite key consists of a partition key and one or more column(s) that determ-
ines where the other columns are going to be stored. Also, the other columns in the
composite key determine relative ordering for the set of columns that are being in-
serted as a row with the key.
Remember that a partition is completely stored on a node. The benefit of this is
that the fetches are faster, but at the same time a partition is limited by the total
number of cells that it can hold, which is 2 billion cells. The other downside of
having everything on one partition may cause lots of requests to go to only a
couple of nodes (replicas), making them a hotspot in the cluster, which is not good.
You can avoid this by using some sort of bucketing such as involving months and
years in the partition key. This will make sure that the partition changes every
Search WWH ::




Custom Search