Database Reference
In-Depth Information
duck | Anatidae | Anas | A. platyrhynchos
|
null
As you can see in Figure 3.2 , when we use a COMPOUND KEY , the data for
wolf and for dog is stored on the same server. This is because we changed the par-
tition to “family” and clustered on “genus.” Literally, this means that the data for
each family will be stored on the same replica sets and presorted, or clustered, by
the genus. This will allow for very fast lookups when the family and genus for an
animal are known.
Model Queries—Not Data
The first thing you should consider when creating a data model in Cassandra is the
performance characteristics of the query patterns. In Cassandra, rows are not seg-
mented across nodes. This means that if a row exists on a node, the entire row will
exist on that node. If you have a heavy read or write load on a particular key, this
can lead to hot spots. Hot spots occur when a particular key (row) gets so many
queries that it causes the load on the machine to spike. High load on a given ma-
chine can cause cluster-wide complications as the communication channels start to
back up. Consideration also needs to be given to the row size. A single row has to
fit on disk; if you have a single row containing billions of columns, it may extend
past the amount of available disk space on the drive.
In Listing 3.3 , you can see a typical way to store event logs in a relational data-
base. There is an atomically incrementing ID field, an event time, an event type
ID that relates to an event type, and some information about the event. While you
may be able to mimic this model in Cassandra, it would not be performant and
would cause queries that would require two lookups (one for the event row and the
other for the event type), as Cassandra does not support joins.
Listing 3.3 Example of a Relational Data Model for Log Storage
Click here to view code image
CREATE TABLE events (
id INT PRIMARY KEY,
time TIME,
event_type INT references event_types(id),
data text
);
 
Search WWH ::




Custom Search