Monitoring - Practical Cassandra

Database Reference

In-Depth Information

case with nodes with high ping times), writes can be slow to come in and register,

reads and writes will be dropped to keep up with the demand being put on the sys-

tem, or any number of other weird behaviors may appear. What constitutes a high

ping time from your monitoring server depends to a great extent on your network

paths. Run a few ping tests from your monitoring server to your Cassandra nodes

during regular usage periods to get a feel for what a normal threshold is.

CPU Usage

Cassandra is usually an I/O-bound system. You usually run into problems with

disk writes or reads slowing down long before you run into CPU-related slow-

down. But just to be safe, as different workloads call for different tools to be used

at different times, you should monitor CPU usage. While there are many things

you could look for when monitoring CPU usage, such as context switches or in-

terrupt requests, a good place to start is usually watching the system load average.

The system load average is an average of the number of processes waiting to get

into the system's run queue over a period of time. In the case of the uptime com-

mand, it's over one, five, and 15 minutes. Keep in mind that in the case of mul-

tiprocessor systems, the load is relative to the number of processors and cores on

the system.

The common rule for utilization is that you want to have a machine working

hard but not overworking. This means that you typically want to have the machine

running at about 70% utilization. That leaves you headroom for spikes in work

and doesn't leave the machine underutilized during slower periods. So if you have

four cores, having the load sit at around 3.00 is usually a safe bet. If you have four

cores and the load is 3.5 or higher, you should try to find out what's wrong and fix

it before things go from bad to worse.

Cassandra-Specific Health Checks

Once you have the basic system checks in place, it's time to add monitoring that

is specific to Cassandra. There are various checks that interact with Cassandra at

different levels of the system. Some are superficial such as checking to see if ports

are alive and being listened on. Some checks require using a slightly more in-depth

toolset to programmatically check the MBeans described earlier.

Search WWH ::

Custom Search

Home