Database Reference
In-Depth Information
To compensate for this, since we know the hour we want ahead of time, we can
key off that, and by using dynamic tables in Cassandra, we can ensure that every
event for that particular hour exists, physically, on the same row. When you spe-
cify multiple fields in the PRIMARY KEY , Cassandra will key off the first field
and every subsequent field will be a part of the column name. In Listing 3.5 , we
restructure the code to store fields by hour in the table to make hourly lookups of
events easier.
Listing 3.5 Example of Cassandra Data Model for Log Storage (Low Traffic)
CREATE TABLE events (
hour TIMESTAMP,
id UUID,
time TIMESTAMP,
event_type TEXT,
data text
PRIMARY KEY(hour, id)
);
With this new model, we can look up all events from a single hour easily as
we can ensure that everything that happened within that hour exists in a single
row on disk. This may be a perfectly suitable model for a low-volume application.
However, if you have heavy reads and/or writes, you may need to segment the row
into multiple rows. Seeing as an event per row is difficult to query, and a row every
hour can lead to hot spots, as all reads and writes are going to the same files on
disk, we can segment the row into multiple rows that are easy to read as long as
we have a piece of information to query by; in this case, it is event_type . We
can further improve performance by ensuring that the order is stored by event time
rather than ID. This will make it possible to do range queries based on the time of
an event, without scanning the entire row. In Listing 3.6 , we will use a composite
row key and clustering order to remove hot spots and order by event time in a des-
cending fashion.
Listing 3.6 Example of Cassandra Data Model for Log Storage (Optimized)
Click here to view code image
CREATE TABLE events (
hour TIMESTAMP,
 
 
Search WWH ::




Custom Search