Data Modeling - Practical Cassandra

Database Reference

In-Depth Information

To compensate for this, since we know the hour we want ahead of time, we can

key off that, and by using dynamic tables in Cassandra, we can ensure that every

event for that particular hour exists, physically, on the same row. When you spe-

cify multiple fields in the PRIMARY KEY , Cassandra will key off the first field

and every subsequent field will be a part of the column name. In Listing 3.5 , we

restructure the code to store fields by hour in the table to make hourly lookups of

events easier.

Listing 3.5 Example of Cassandra Data Model for Log Storage (Low Traffic)

CREATE TABLE events (

hour TIMESTAMP,

id UUID,

time TIMESTAMP,

event_type TEXT,

data text

PRIMARY KEY(hour, id)

);

With this new model, we can look up all events from a single hour easily as

we can ensure that everything that happened within that hour exists in a single

row on disk. This may be a perfectly suitable model for a low-volume application.

However, if you have heavy reads and/or writes, you may need to segment the row

into multiple rows. Seeing as an event per row is difficult to query, and a row every

hour can lead to hot spots, as all reads and writes are going to the same files on

disk, we can segment the row into multiple rows that are easy to read as long as

we have a piece of information to query by; in this case, it is event_type . We

can further improve performance by ensuring that the order is stored by event time

rather than ID. This will make it possible to do range queries based on the time of

an event, without scanning the entire row. In Listing 3.6 , we will use a composite

row key and clustering order to remove hot spots and order by event time in a des-

cending fashion.

Listing 3.6 Example of Cassandra Data Model for Log Storage (Optimized)

CREATE TABLE events (

hour TIMESTAMP,

Search WWH ::

Custom Search

Home