Database Reference
In-Depth Information
Procedural Parallelism
I would like to discuss two types of procedural parallelism:
•
Parallel pipelined functions, which is a feature of Oracle.
•
Do-it-yourself (DIY) parallelism, which is the application to your own applications of the
same techniques that Oracle applies to parallel full table scans. DIY parallelism is more
of a development technique than anything built into Oracle directly in Oracle Database
11
g
Release 1 and before - and a native database feature in Oracle Database 11
g
Release 2 and later.
Often you'll find that applications—typically batch processes—designed to execute serially will look something
like the following procedure:
Create procedure process_data
As
Begin
For x in ( select * from some_table )
Perform complex process on X
Update some other table, or insert the record somewhere else
End loop
end
In this case, Oracle's parallel query or PDML won't help a bit (in fact, parallel execution of the SQL by Oracle
here would likely only cause the database to consume more resources and take longer). If Oracle were to execute
the simple
SELECT * FROM SOME_TABLE
in parallel, it would provide this algorithm no apparent increase in speed
whatsoever. If Oracle were to perform in parallel the
UPDATE
or
INSERT
after the complex process, it would have no
positive affect (it is a single-row
UPDATE
/
INSERT
, after all).
There is one obvious thing you could do here: use array processing for the
UPDATE
/
INSERT
after the complex
process. However, that isn't going to give you a 50 percent reduction or more in runtime, which is often what you're
looking for. Don't get me wrong, you definitely want to implement array processing for the modifications here, but it
won't make this process run two, three, four, or more times faster.
Now, suppose this process runs at night on a machine with four CPUs, and it is the only activity taking place. You
have observed that only one CPU is partially used on this system, and the disk system is not being used very much at
all. Further, this process is taking hours, and every day it takes a little longer as more data is added. You need to reduce
the runtime dramatically—it needs to run four or eight times faster—so incremental percentage increases will not be
sufficient. What can you do?
There are two approaches you can take. One approach is to implement a parallel pipelined function, whereby
Oracle will decide on appropriate degrees of parallelism (assuming you have opted for that, which would be
recommended). Oracle will create the sessions, coordinate them, and run them, very much like the previous example
with parallel DDL or parallel DML where, by using
CREATE TABLE AS SELECT
or
INSERT /*+ APPEND */
, Oracle
fully automated parallel direct path loads for us. The other approach is DIY parallelism. We'll take a look at both
approaches in the sections that follow.
Parallel Pipelined Functions
We'd like to take that very serial process
PROCESS_DATA
from earlier and have Oracle execute it in parallel for us.
To accomplish this, we need to turn the routine inside out. Instead of selecting rows from some table, processing
them, and inserting them into another table, we will insert into another table the results of fetching some rows and
processing them. We will remove the
INSERT
at the bottom of that loop and replace it in the code with a
PIPE ROW
clause. The
PIPE ROW
clause allows our PL/SQL routine to generate table data as its output, so we'll be able to
SELECT