Establishing Relationships - Learning Apache Cassandra

Database Reference

In-Depth Information

Other uses of secondary indexes

We've seen that secondary indexes neatly allow us to turn a denormalized relationship

structure into a normalized one: the user_follows table is able to answer all the im-

portant questions about follow relationships without any duplication of data. This is but one

of the many use cases for which secondary indexes are well suited.

Secondary indexes are best suited for low-cardinality columns, which is to say columns

that contain the same value for many rows. An example might be a location column on

the users table; if this is restricted to city and state, many users will share the same loca-

tion. In fact, we will add a location column to the users table in Adding columns to

tables section in Chapter 7 , Expanding Your Data Model . If we wanted to be able to answer

questions such as "Who are all of the users that live in New York?" that index would be

quite useful.

Secondary indexes can also be used for columns whose values are unique, such as the

email column in the users table. If, for instance, we wanted to build a "forgot pass-

word" feature in which the user enters their email address, we'd be able to use an index on

email to look up the user's record.

Use caution when creating indexes on unique or high-cardinality columns; these indexes

can only handle moderate query volume. The DataStax Cassandra reference warns us:

"For columns containing unique data, it is sometimes fine performance-wise to use an

index for convenience, as long as the query volume to the table having an indexed

column is moderate and not under constant load."

Another interesting use case for secondary indexes would be for easier lookup of status up-

dates from the user_status_updates table. Recall that that table's partition key is the

username of the user who wrote the status update, and the clustering column is a UUID

column. In order to look up a specific status update, we would normally have to specify

both username and id values. But the id column alone uniquely identifies each status

update, because UUIDs are globally unique. So, if we place a secondary index on the id

column, we can look up status updates more tersely, which is useful to generate URLs and

the like.

Search WWH ::

Custom Search

Home