Database Reference
In-Depth Information
Ports
There are three primary ports of interest to Cassandra: 7000 (or 7001 if SSL/TLS
is enabled), 7199, and 9160. Port 7000/7001 is used by Cassandra for cluster com-
munication. This includes things such as the Gossip protocol and failure detection.
Port 7199 is used by JMX. Port 9160 is the Thrift port and is used for client com-
munication. In order for your cluster to function properly, all of these ports should
be accessible.
While it is not necessary to specifically monitor these ports, it is a good idea
to test them out one way or another. Testing the Thrift port (9160) is just testing
whether you can connect to an instance using a Cassandra driver. In terms of mon-
itoring, if you can connect, the check passes. If you can't connect to the server, the
check should send off an alert. You can also use a simple TCP check here even
though it is less comprehensive.
JMX Checks
Using some of the knowledge we gained from looking at the normal behavior of
our system with JConsole, we are going to add some checks using JMX. There are
plug-ins for Nagios that enable you to run JMX queries and compare the results
against a set of predetermined thresholds. While there are many values that can be
monitored through JMX, there are a few that stand out.
The first set of JMX checks to create is for read and write request latency. These
values are given in microseconds because they should be that small. These laten-
cies can be measured at the Cassandra application level and/or at the ColumnFam-
ily level. Measuring them at the application level is important as a general health
metric. High request latencies can be indicative of a bad disk or that your current
read pattern is starting to slow down. If there is a ColumnFamily for which it is
particularly important to have extremely low-latency reads and/or writes, it would
be a good decision to monitor the performance for that ColumnFamily as well. It
is important to note that read latency and write latency are two separate metrics
provided by Cassandra, and both are important in their own right depending on
your workload.
The next set of JMX metrics to keep tabs on is garbage collection timing. Cas-
sandra will not only tell you how long its last garbage collection took but also how
long that last ParNew GC took. A good way to think of ParNew garbage collec-
tion is that it is a stop-the-world garbage collection that uses multiple GC threads
to complete its job. If you are monitoring the amount of time these take, you can
Search WWH ::




Custom Search