Organizing Related Data - Learning Apache Cassandra

Database Reference

In-Depth Information

The structure of the status updates table

The most notable aspect of user_status_updates is that it has two columns for a

primary key. This means that each row is identified uniquely by the combination of its

username and id columns. It also means that every row must have a value in both of

these columns.

In addition to this, our user_status_updates table is the first time we've seen a

timeuuid column in the wild. As you can recollect from the previous chapter, a UUID is

essentially a very large number that is generated using an algorithm that guarantees that the

identifier is unique across time and space.

You will also recollect that Cassandra does not have the ability to generate auto-increment-

ing sequences for use in primary keys, as this would require an unacceptably high level of

coordination between nodes in the cluster. In the users table, we used a natural key; we

want each user to have a unique username, so the username column makes a perfectly

good unique identifier for rows.

In the case of a status update , however, there is no obvious natural key. The only user-

generated data associated with a status update is the body, but a free text field doesn't make

a very good primary key, and anyway there's no guarantee that status update bodies will be

unique. This is where UUIDs come in handy. Since they're guaranteed to be unique, we can

use them as a surrogate key —a unique identifier that isn't derived from the data in the row.

Auto-incrementing primary keys in relational databases are also surrogate keys.

UUIDs and timestamps

While there are several algorithms that can be used to generate UUIDs, the Version 1

UUID has an additional useful property: part of the UUID encodes the timestamp at which

the UUID is generated. This timestamp can be extracted from the full UUID, meaning that

it's possible to know exactly when any Version 1 UUID was generated.

Cassandra's timeuuid type lets us capitalize on that property. Cassandra is aware of the

structure of a timeuuid , and is able to both convert timestamps into UUIDs and to ex-

tract the creation timestamp from a UUID. As we'll soon see, Cassandra can also sort our

rows by their creation time using the timestamps encoded in the UUIDs.

Search WWH ::

Custom Search

Home