Query Performance Tuning - PostgreSQL: Up and Running

Database Reference

In-Depth Information

Example 9-9. Disable sequential scan, coerce index use

set enable_seqscan = false ;

EXPLAIN ( ANALYZE )

SELECT *

FROM census . lu_fact_types

WHERE fact_subcats && '{White alone, Black alone}' :: varchar [];

Bitmap Heap Scan on lu_fact_types (cost=12.02..14.04 rows=2 width=200) (actual

time=0.058..0.058 rows=2 loops=1) Recheck Cond: (fact_subcats && '{"White

alone","Black alone"}'::character varying[]) Heap Blocks: exact=1 -> Bitmap Index

Scan on idx_lu_fact_types

(cost=0.00..12.02 rows=2 width=0) (actual time=0.048..0.048 rows=2 loops=1)

Index Cond: (fact_subcats && '{"White alone","Black alone"}'::character

varying[])

Planning time: 0.230 ms

Execution time: 0.119 ms

From this plan, we learn that our index can be used, but because the estimated cost is

more than doing a sequential scan, the planner under normal circumstances will opt

for the sequential scan. The planner was right in its assessment because our index exe‐

cution time turns out to be a little more than a sequential scan. As we add more data to

our table, we'll probably find that the planner changes strategies to an index scan.

In contrast to the previous example, suppose we were to write a query of the form:

SELECT * FROM census . lu_fact_types WHERE 'White alone' = ANY ( fact_subcats );

We would discover that, regardless of how we set enable_seqscan , the planner will

always perform a sequential scan because the index we have in place can't service this

query. So it is important to consider which indexes will be useful and to write queries

to take advantage of them. And experiment, experiment, experiment!

Table Statistics

Despite what you might think or hope, the query planner is not a magician. Its decisions

follow prescribed logic that's far beyond the scope of this topic. The rules that the planner

follows depend heavily on the current state of the data. The planner can't possibly scan

all the tables and rows prior to formulating its plan. That would be self-defeating. In‐

stead, it relies on aggregated statistics about the data.

Having accurate and current stats is crucial for the planner to make the right decision.

If stats differ greatly from reality, the planner will often come up with bad plans, the

most detrimental of these being unnecessary sequential table scans. Generally, only

about 20 percent of the entire table is sampled to produce stats. This percentage could

be even lower for very large tables. You can control the number of rows sampled on a

column-by-column basis by setting the STATISTICS value.

Search WWH ::

Custom Search

Home