Monitoring and Diagnosis - PostgreSQL 9 Administration

Database Reference

In-Depth Information

Monitoring is important

Databases are not isolated entities. They live on computer hardware using CPUs, RAM, and

disk subsystems. Users access the database using networks. Depending on the setup, the

databases themselves may need network resources to function, either by performing some

authentication checks when users log in, or using disks that are mounted over the network

(not generally recommended), or doing remote function calls to other databases.

This means that monitoring only the database is not enough. As a minimum, one should also

monitor everything directly involved in using the database, such as the following:

F Is the database host available? Does it accept connections?

F How much of the network bandwidth is in use? Have there been network

interruptions and dropped connections?

F Is there enough RAM available for most common tasks? How much is left?

F Is there enough disk space available? When will it run out of disk space?

F Is the disk subsystem keeping up? How much more load can it take?

F Can the CPU keep up with load? How much of spare idle cycles do the CPUs have?

F Are other network services the database access depends on (if any) available? For

example, if you use Kerberos for authentication you have to monitor it as well.

F How many context switches are happening when the database is running?

And, for most of these things, you are interested in history, that is, how things have evolved?

Was everything mostly the same yesterday? Last week? When did the disk usage start

changing rapidly?

For any larger installation, you probably already have something in place for monitoring the

health of your hosts and network.

The two aspects of monitoring are collecting historical data to see how things have evolved

and getting alerts when things go seriously wrong. RRDtool (Round Robin Database Tool)

based tools, such as Cacti or Munin, are quite popular for collecting the historical information

on all aspects of the servers, and presenting this information in an easy-to-follow graphical

form. Seeing several statistics on the same timescale can really help when trying to figure out

why the system is behaving the way it is.

Another aspect of monitoring is getting alerts when something goes really wrong and needs

(immediate) attention.

For alerting, one of the most widely-used tools is the Nagios.

And then, of course, there is SNMP (Simple Network Management Protocol), which is

supported by a wide array of commercial monitoring solutions. Basic support for monitoring

PostgreSQL through SNMP is found in pgsnmpd , available at the following URL:

http://pgsnmpd.projects.postgresql.org/

Search WWH ::

Custom Search

Home