Database Reference
In-Depth Information
The relevant statistics from the previous query are the following:
Queries Parallelized indicates the number of SELECT statements that were executed in
parallel. In the previous scenario, a total of three queries were executed in the current session.
DML Parallelized indicates the number of DML statements that were executed in parallel. In
the current session, there were no DML statements executed.
DFO Trees (where DFO is data flow operator) indicates the number of times a serial execution
plan was converted to a parallel plan. In the preceding scenario, a total of three queries were
converted from a serial plan to a parallel plan.
Server Threads indicates the number of parallel query slaves used.
Allocation Height indicates the DOP for the instance.
Allocation Width indicates the number of instances involved in this operation.
Yet another troubleshooting opportunity is to check the V$ views for slave activity. Executing the following
statement will indicate if slaves are running:
SELECT SLAVE_NAME, STATUS, CPU_SECS_TOTAL
FROM V$PQ_SLAVE;
If the query returns no rows, this would indicate that there are no slaves running. If there were no difference in
CPU usage between the two runs, this would indicate that there has been no CPU activity for the sampling period.
Running queries in parallel, as opposed to running them serially, can have benefits on performance. However,
when defining parallel queries, the following should be taken into consideration:
Parallel query has a significant startup cost.
Multiple slave processes consume more CPU than single processes.
Each slave process must have its own address space (memory allocation).
Parallel query was designed to reduce execution time for queries that had no option but to read large quantities
of data. It maximizes the data throughput by spreading the read workload across multiple processes or slaves. Sort
operations are also managed using the slaves' (TQ) structures.
This has the effect of increasing the CPU load on the system as a whole. If the system is running at maximum CPU
utilization, parallel query will not get any more out of the query. If no more CPU is available, optimizer will, based on
the resources available, make the decision to serialize the parallel operations.
The additional I/O requirements of the slave processes reading the data can also stress a heavily loaded I/O
subsystem. Distributing data across multiple disk spindles may help alleviate situations where disk I/O may become a
bottleneck. Additionally, queries may just run quicker serially. Typically, queries that use index lookups are not suited
for PQO.
Nested Loops vs. Hash/Sort Merge Joins Typically, parallel query tries to use full table
scan (FTS) to access data, while lookups are divided between slaves on a ROWID range
basis. Characteristically, nested loop joins are not really efficient with FTS. Hash joins and
sort merge options are usually much more efficient at processing large amounts of data.
However, there is a downside, as hash join and sort merge join do not allow row elimination
based on data from a driving row source. This elimination can drastically reduce the data
sets involved and could mean that a serial access path using index lookup is quicker than a
parallel operation simply due to the volume of data eliminated.
Data Splitting: Dividing the data up, passing it through multiple processes, and collating
the results may make the cost of slave generation greater than retrieving the data serially.
 
Search WWH ::




Custom Search