Troubleshooting - Practical Cassandra

Database Reference

In-Depth Information

There are a number of ways to see that the read capacity of your system isn't

keeping up. The first is to use nodetool cfstats to see how many SSTables

are in the ColumnFamily. If that number is continually increasing, your cluster's

I/O capacity isn't high enough to keep up with the write load. And because the

compactions aren't taking place (quickly enough) to group the necessary data to-

gether properly in the SSTables, the data is getting fragmented across the SST-

ables. The way to fix this is by adding more I/O capacity. This can be done by

either increasing the disk speed (with something like SSDs) or increasing the num-

ber of nodes in the cluster.

On the other hand, if the SSTable count is low, take a look at the file cache

on each machine as it compares to the read pattern. To calculate the amount of

file cache, you can use the formula of total_system_memory - JVM_heap_size. If

the amount of data is greater than that, and you have a roughly random read pat-

tern, then an equal ratio of reads to the cache-to-data ratio will need to seek to the

disk. In other words, you may be able to deal with some of the read issues by en-

abling key or row caches (by setting KEYS_ONLY , ROWS_ONLY , or ALL ). It is

also worth noting that if you set the cache to use row caching, ensure that the row

cache stays relatively small (about 20,000 rows); the key cache can be at 100%.

Freezing Nodes

You may run into a situation where the operating system is still responding nor-

mally, but Cassandra seems to be moving slowly. The first thing to check is wheth-

er garbage collection is running. In your Cassandra system.log you should look for

entries that reference GCInspector, indicating that either ParNew or the Concur-

rentMarkSweep collectors are taking a long time to run. You will likely see entries

that look somewhat similar to Listing 10.5 . These are entries pulled from a ma-

chine that is having GC issues. Notice that the total time spent in GC is high (ran-

ging from a few seconds up to a few minutes).

Listing 10.5 Example Log Entries for Long-Running GCs

INFO [ScheduledTasks:1] 2013-02-20 15:40:57,096

GCInspector.java (line 122) GC for

ParNew: 17305 ms for 1 collections, 2634113808

used; max is 7432306688

INFO [GC inspection] 2013-02-20 15:49:45,973 GCIn-

Search WWH ::

Custom Search

Home