Database Reference
In-Depth Information
Identifying and fixing bloated tables
and indexes
PostgreSQL implements MVCC (Multi-Version Concurrency Control), that allows users
to read data at the same time as writers make changes. This is an important feature for
concurrency in database applications, as it can allow the following:
F
Better performance because of fewer locks
F
Greatly reduced deadlocking
F
Simplified application design and management
MVCC is a core part of PostgreSQL and cannot be turned off, nor would you really want it to
be. The internals of MVCC have some implications for the DBA that need to be understood.
The price for these benefits is that SQL
UPDATE
command can cause tables and indexes to
grow in size because they leave behind dead row versions. DELETEs and aborted INSERTs
take up space that must be reclaimed by garbage collection. VACUUM is the mechanism by
which we reclaim space, though there is also another internals feature named HOT, which
does much of this work automatically for us.
Knowing this, many people become worried by and spend much time trying to rid themselves
of dead row versions. Many users will be familiar with tools to perform tasks, such as
defragmentation, shrinking, reorganization, and table optimization. These things are
necessary, though you should not be unduly worried by the need for vacuuming in PostgreSQL.
Many users execute VACUUM far too frequently, while at the same time complaining about the
cost of doing so.
This recipe is all about understanding when you need to run VACUUM by estimating the
amount of bloat in tables and indexes.
How to do it...
The best way to understand things is to look at things the same way that autovacuum does. Use
the following query, derived by Greg Smith for his
PostgreSQL 9.0 High Performance
topic, also
by
Packt
. The calculations are derived directly from the autovacuum documentation.
CREATE OR REPLACE VIEW av_needed AS
SELECT *,
n_dead_tup > av_threshold AS "av_needed",
CASE WHEN reltuples > 0
THEN round(100.0 * n_dead_tup / (reltuples))
ELSE 0
END AS pct_dead
FROM
(SELECT