Database Reference
In-Depth Information
The task we were performing as an example was to add a column onto many tables. In the
previous example, we were adding the column with no specified default; so the new column
will have a NULL value, and, as a result, with ALTER TABLE it will run very quickly, even on
large tables. If we change the ALTER TABLE statement so that we specify a default, then the
SQL will need to re-write the whole table. So, the run time will vary according to table size
(approximately, though also by number and type of indexes).
Now that our subtasks vary in runtime according to size, we need to be more careful splitting
up the subtasks, so that we end up with multiple scripts that will run for about the same time.
If we already know that we have just a few big tables, it's easy to just split those out manually
into their own scripts.
If the database has many large tables, then we can sort SQL statements by size and then
distribute them using round-robin distribution into multiple subscripts that will have approximately
similar runtime. The following two SQL statements are an example of this technique:
\t on
\o script0.sql
SELECT sql FROM (
SELECT 'ALTER TABLE '|| n.nspname || '.' || c.relname ||
' ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT
now();' as sql
,row_number() OVER (ORDER BY pg_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n
ON c.relnamespace = n.oid
WHERE n.nspname = 'test'
ORDER BY 2 DESC) as s
WHERE row_number % 2 = 0;
\o script1.sql
SELECT sql FROM (
SELECT 'ALTER TABLE '|| n.nspname || '.' || c.relname ||
' ADD COLUMN last_update_timestamp TIMESTAMP WITH TIME ZONE DEFAULT
now();' as sql
,row_number() OVER (ORDER BY pg_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n
ON c.relnamespace = n.oid
WHERE n.nspname = 'test'
ORDER BY 2 DESC) as s
WHERE row_number % 2 = 1;
Then, execute the jobs in parallel, like the following:
$ psql -f script0.sql &
$ psql -f script1.sql &
 
Search WWH ::




Custom Search