Database Reference
In-Depth Information
Limitations of secondary indexes
We've seen that secondary indexes are useful in a range of scenarios, and we'll explore
without limitations and downsides.
Secondary indexes can only have one column
One major limitation of secondary indexes is that they can only target one column. It would
not be legal, for instance, to specify a secondary index such as:
CREATE INDEX ON "users" ("email", "encrypted_password");
We'll explore one workaround to this limitation in the
Working with tuples
section of
Secondary indexes can only be tested for equality
In the section on
Retrieving status updates for a specific time range
in
Chapter 4
,
Beyond
Key-Value Lookup
, we explored the use of inequality operators to select ranges of columns.
Secondary indexes can only be queried for equality; queries such as the following are not
possible:
SELECT * FROM "user_follows"
WHERE "follower_username" > 'alice';
Secondary index lookup is not as efficient as primary key lookup
Although secondary indexes give us a reasonably efficient way to look up rows using a
non-partition key column, they're not as efficient as queries based on the primary key we've
explored in previous chapters. This is because lookup by a secondary index is a two-step
process. First, Cassandra will access the secondary index to find the primary keys of all
rows matching the query. Second, it will access the table itself to retrieve the matched
rows. The second step will generally involve querying over many partitions, or at least over
disjoint ranges of a single partition. These random reads over the data can never be as effi-
cient as a single focused read of one range of one partition.
For this reason, it's best to avoid using secondary indexes for the core data access patterns
of your application. Our structure in the
user_follows
table bears this out: answering
the question, "Who follows
alice
?", is central to the process of broadcasting her status