Database Reference
In-Depth Information
This is just in the case of bad things happening in the INFO level. You can also
have the log monitoring system alert you if any FATAL , ERROR , or WARNING log
messages are put into the logs. Many of these plug-ins are configurable enough to
send the log messages (or at least the one that caused the notification) along with
the alert.
Cassandra Interactions
Now that we have the OS and system layer monitored and we know Cassandra is
up and at least responding, it's time to check a little deeper. The further into the ap-
plication you monitor, the better you will be able to sleep at night knowing things
are functioning the way you want them to. Although it is useful and necessary to
have superficial checks like load average and memory, the real value of monitor-
ing systems is realized as you get deeper into the application.
What this means is that you should be checking things that are specific to your
application in addition to the Cassandra server. If your application writes to a new
ColumnFamily at the beginning of every month, you should have your monitoring
system check before the month turnover that the new ColumnFamily exists (and
optionally create it if it doesn't).
Another good use of monitoring resources is to check the response time of cer-
tain queries. If you are regularly running queries that roll up all the events for an
hour, monitor how long that query takes to run and set up an alert if it's outside
the normal threshold. In other words, if the query runs too fast, you want to know
because it's possible you aren't collecting all the data you expect to be there. If the
query takes too long to run, your system could be under heavy load or you may
have just hit a point where you need to rethink your query patterns. Either way,
that type of instrumentation is useful to measure how your system actually per-
forms compared to how you expect it to perform.
If you run an application at the top of every hour—an extract, transform, load
(ETL) process, for example—it might be a good idea to have the application put
a “run complete” column somewhere when it's done. At the beginning of every
hour, the monitoring system can run a query to check for the existence of the
column for the last hour. If the “run complete” column doesn't exist for the last
hour, it would be good to know so you can look into why.
Search WWH ::




Custom Search