Tables & Data - PostgreSQL 9 Administration

Database Reference

In-Depth Information

DELETE FROM mytable

WHERE ctid NOT IN

(SELECT min(ctid)

FROM has_duplicates

-- need WHERE clause to filter only exact duplicates

GROUP BY col1, col2, …, colN);

The preceding query works by grouping together all the rows with similar values and then

finding the row with the lowest ctid value. Lowest will mean nearer to the start of the table,

so duplicates will be removed from the far end of the table. When we run VACUUM , we may

find that the table gets smaller, because we have removed rows from the far end.

The BEGIN and COMMIT commands wrap the LOCK and DELETE commands into a

single transaction. The LOCK command applies a sufficiently high level of lock against

the table to prevent UPDATES and DELETES from being executed against the table while

we remove duplicates.

There's more...

Locking the table against changes for long periods may not be possible while we remove

duplicate rows. That gives some fairly hard problems with large tables. In that case, we need

to do things slightly differently:

F Identify the rows to be deleted, and save them into a side table

F Build an index on the main table to speed access to rows

F Write a program that reads the rows from the side table in a loop, performing a series

of smaller transactions.

F Start a new transaction

F Read a set of rows from the side table that match

F Select for update those rows from the main table, relying on the index to make those

accesses happen quickly

F Delete the appropriate rows

F Commit, and then loop again

The aforementioned program can't be written as a database function, as we can't have

multiple transactions in a function. We need multiple transactions to ensure we hold locks on

each row for the shortest possible duration.

Preventing duplicate rows

Preventing duplicate rows is one of the most important aspects of data quality for any

database. PostgreSQL offers some useful features in this area, extending beyond most

relational databases.

Search WWH ::

Custom Search

Home