Database Reference
In-Depth Information
Multiple partitions and read efficiency
In the real world, most users will likely follow dozens or hundreds of other users. In this
case, our WHERE…IN clause will specify hundreds of partitions. Remember from Chapter
3 , Organizing Related Data that each partition is stored separately by Cassandra; querying
hundreds of partitions would require hundreds of random accesses. In fact, Cassandra's of-
ficial documentation warns us against using WHERE…IN in most circumstances:
"Under most conditions, using IN in the WHERE clause is not recommended. Using
IN can degrade performance because usually many nodes must be queried."
Furthermore, in this particular case, Cassandra has to retrieve one page of rows from each
partition, perform an ordered merge, and throw away all but the last handful. For instance,
if I follow 100 users and have a page size of 10, Cassandra must retrieve 1,000 rows just to
figure out which 10 are the most recent. While this approach technically works, its read
performance characteristics aren't going to cut it for a production application. We'll need to
explore different strategies.
Search WWH ::




Custom Search