Database Reference
In-Depth Information
Other uses of secondary indexes
We've seen that secondary indexes neatly allow us to turn a denormalized relationship
structure into a normalized one: the user_follows table is able to answer all the im-
portant questions about follow relationships without any duplication of data. This is but one
of the many use cases for which secondary indexes are well suited.
Secondary indexes are best suited for low-cardinality columns, which is to say columns
that contain the same value for many rows. An example might be a location column on
the users table; if this is restricted to city and state, many users will share the same loca-
tion. In fact, we will add a location column to the users table in Adding columns to
tables section in Chapter 7 , Expanding Your Data Model . If we wanted to be able to answer
questions such as "Who are all of the users that live in New York?" that index would be
quite useful.
Secondary indexes can also be used for columns whose values are unique, such as the
email column in the users table. If, for instance, we wanted to build a "forgot pass-
word" feature in which the user enters their email address, we'd be able to use an index on
email to look up the user's record.
Use caution when creating indexes on unique or high-cardinality columns; these indexes
can only handle moderate query volume. The DataStax Cassandra reference warns us:
"For columns containing unique data, it is sometimes fine performance-wise to use an
index for convenience, as long as the query volume to the table having an indexed
column is moderate and not under constant load."
Another interesting use case for secondary indexes would be for easier lookup of status up-
dates from the user_status_updates table. Recall that that table's partition key is the
username of the user who wrote the status update, and the clustering column is a UUID
column. In order to look up a specific status update, we would normally have to specify
both username and id values. But the id column alone uniquely identifies each status
update, because UUIDs are globally unique. So, if we place a secondary index on the id
column, we can look up status updates more tersely, which is useful to generate URLs and
the like.
Search WWH ::




Custom Search