Database Reference
In-Depth Information
case with nodes with high ping times), writes can be slow to come in and register,
reads and writes will be dropped to keep up with the demand being put on the sys-
tem, or any number of other weird behaviors may appear. What constitutes a high
ping time from your monitoring server depends to a great extent on your network
paths. Run a few ping tests from your monitoring server to your Cassandra nodes
during regular usage periods to get a feel for what a normal threshold is.
CPU Usage
Cassandra is usually an I/O-bound system. You usually run into problems with
disk writes or reads slowing down long before you run into CPU-related slow-
down. But just to be safe, as different workloads call for different tools to be used
at different times, you should monitor CPU usage. While there are many things
you could look for when monitoring CPU usage, such as context switches or in-
terrupt requests, a good place to start is usually watching the system load average.
The system load average is an average of the number of processes waiting to get
into the system's run queue over a period of time. In the case of the uptime com-
mand, it's over one, five, and 15 minutes. Keep in mind that in the case of mul-
tiprocessor systems, the load is relative to the number of processors and cores on
the system.
The common rule for utilization is that you want to have a machine working
hard but not overworking. This means that you typically want to have the machine
running at about 70% utilization. That leaves you headroom for spikes in work
and doesn't leave the machine underutilized during slower periods. So if you have
four cores, having the load sit at around 3.00 is usually a safe bet. If you have four
cores and the load is 3.5 or higher, you should try to find out what's wrong and fix
it before things go from bad to worse.
Cassandra-Specific Health Checks
Once you have the basic system checks in place, it's time to add monitoring that
is specific to Cassandra. There are various checks that interact with Cassandra at
different levels of the system. Some are superficial such as checking to see if ports
are alive and being listened on. Some checks require using a slightly more in-depth
toolset to programmatically check the MBeans described earlier.
Search WWH ::




Custom Search