Database Reference
In-Depth Information
Monitoring is important
Databases are not isolated entities. They live on computer hardware using CPUs, RAM, and
disk subsystems. Users access the database using networks. Depending on the setup, the
databases themselves may need network resources to function, either by performing some
authentication checks when users log in, or using disks that are mounted over the network
(not generally recommended), or doing remote function calls to other databases.
This means that monitoring only the database is not enough. As a minimum, one should also
monitor everything directly involved in using the database, such as the following:
F Is the database host available? Does it accept connections?
F How much of the network bandwidth is in use? Have there been network
interruptions and dropped connections?
F Is there enough RAM available for most common tasks? How much is left?
F Is there enough disk space available? When will it run out of disk space?
F Is the disk subsystem keeping up? How much more load can it take?
F Can the CPU keep up with load? How much of spare idle cycles do the CPUs have?
F Are other network services the database access depends on (if any) available? For
example, if you use Kerberos for authentication you have to monitor it as well.
F How many context switches are happening when the database is running?
And, for most of these things, you are interested in history, that is, how things have evolved?
Was everything mostly the same yesterday? Last week? When did the disk usage start
changing rapidly?
For any larger installation, you probably already have something in place for monitoring the
health of your hosts and network.
The two aspects of monitoring are collecting historical data to see how things have evolved
and getting alerts when things go seriously wrong. RRDtool (Round Robin Database Tool)
based tools, such as Cacti or Munin, are quite popular for collecting the historical information
on all aspects of the servers, and presenting this information in an easy-to-follow graphical
form. Seeing several statistics on the same timescale can really help when trying to figure out
why the system is behaving the way it is.
Another aspect of monitoring is getting alerts when something goes really wrong and needs
(immediate) attention.
For alerting, one of the most widely-used tools is the Nagios.
And then, of course, there is SNMP (Simple Network Management Protocol), which is
supported by a wide array of commercial monitoring solutions. Basic support for monitoring
PostgreSQL through SNMP is found in pgsnmpd , available at the following URL:
http://pgsnmpd.projects.postgresql.org/
 
Search WWH ::




Custom Search