Denormalizing Data for Maximum Performance - Learning Apache Cassandra

Database Reference

In-Depth Information

Partial denormalization

Our initial approach to home timelines, which used the existing, fully normalized data

structure that we've already built, is technically viable but will perform very poorly at scale.

If I follow F users and want a page of size P for my home timeline, Cassandra will need to

do the following:

• Query F partitions for P rows each

• Perform an ordered merge of F x P rows in order to retrieve only the most recent P

The most distressing part of this is the fact that both operations grow in complexity propor-

tionally with the number of people I follow. Let's start by trying to fix this.

The basic goal of the home timeline is to show me the most recent status updates that mat-

ter to me. Instead of doing all the work to find out what status updates matter to me, based

on who I follow, at read time, let's shift some of the work to write time.

I'll create a table that stores references to status updates that I care about. Whenever

someone I follow creates a new status update, I'll add a reference to that update to my home

timeline list; this way, when I want to view my home timeline, I'll know exactly what status

updates to return.

Let's create our new table:

CREATE TABLE "home_status_update_ids" (

"timeline_username" text,

"status_update_id" timeuuid,

"status_update_username" text,

PRIMARY KEY ("timeline_username", "status_update_id")

)

WITH CLUSTERING ORDER BY ("status_update_id" DESC);

Note that there's something subtle going on here with the primary key. In the

user_status_updates table, the username column of the author is the partition

key, and the timestamp UUID, id , is the clustering column. In order to retrieve a specific

row from this table, we need to provide both the username and the id columns.

However, our home_status_update_ids table does not include the author's user-

name in the primary key: the status_update_username column is just a normal data

Search WWH ::

Custom Search

Home