Monitoring - Practical Cassandra

Database Reference

In-Depth Information

This is just in the case of bad things happening in the INFO level. You can also

have the log monitoring system alert you if any FATAL , ERROR , or WARNING log

messages are put into the logs. Many of these plug-ins are configurable enough to

send the log messages (or at least the one that caused the notification) along with

the alert.

Cassandra Interactions

Now that we have the OS and system layer monitored and we know Cassandra is

up and at least responding, it's time to check a little deeper. The further into the ap-

plication you monitor, the better you will be able to sleep at night knowing things

are functioning the way you want them to. Although it is useful and necessary to

have superficial checks like load average and memory, the real value of monitor-

ing systems is realized as you get deeper into the application.

What this means is that you should be checking things that are specific to your

application in addition to the Cassandra server. If your application writes to a new

ColumnFamily at the beginning of every month, you should have your monitoring

system check before the month turnover that the new ColumnFamily exists (and

optionally create it if it doesn't).

Another good use of monitoring resources is to check the response time of cer-

tain queries. If you are regularly running queries that roll up all the events for an

hour, monitor how long that query takes to run and set up an alert if it's outside

the normal threshold. In other words, if the query runs too fast, you want to know

because it's possible you aren't collecting all the data you expect to be there. If the

query takes too long to run, your system could be under heavy load or you may

have just hit a point where you need to rethink your query patterns. Either way,

that type of instrumentation is useful to measure how your system actually per-

forms compared to how you expect it to perform.

If you run an application at the top of every hour—an extract, transform, load

(ETL) process, for example—it might be a good idea to have the application put

a “run complete” column somewhere when it's done. At the beginning of every

hour, the monitoring system can run a query to check for the existence of the

column for the last hour. If the “run complete” column doesn't exist for the last

hour, it would be good to know so you can look into why.

Search WWH ::

Custom Search

Home