Parallel Query Tuning - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

The relevant statistics from the previous query are the following:

•

Queries Parallelized indicates the number of SELECT statements that were executed in

parallel. In the previous scenario, a total of three queries were executed in the current session.

•

DML Parallelized indicates the number of DML statements that were executed in parallel. In

the current session, there were no DML statements executed.

•

DFO Trees (where DFO is data flow operator) indicates the number of times a serial execution

plan was converted to a parallel plan. In the preceding scenario, a total of three queries were

converted from a serial plan to a parallel plan.

•

Server Threads indicates the number of parallel query slaves used.

•

Allocation Height indicates the DOP for the instance.

Allocation Width indicates the number of instances involved in this operation.

Yet another troubleshooting opportunity is to check the V$ views for slave activity. Executing the following

statement will indicate if slaves are running:

•

SELECT SLAVE_NAME, STATUS, CPU_SECS_TOTAL

FROM V$PQ_SLAVE;

If the query returns no rows, this would indicate that there are no slaves running. If there were no difference in

CPU usage between the two runs, this would indicate that there has been no CPU activity for the sampling period.

Running queries in parallel, as opposed to running them serially, can have benefits on performance. However,

when defining parallel queries, the following should be taken into consideration:

•

Parallel query has a significant startup cost.

•

Multiple slave processes consume more CPU than single processes.

•

Each slave process must have its own address space (memory allocation).

Parallel query was designed to reduce execution time for queries that had no option but to read large quantities

of data. It maximizes the data throughput by spreading the read workload across multiple processes or slaves. Sort

operations are also managed using the slaves' (TQ) structures.

This has the effect of increasing the CPU load on the system as a whole. If the system is running at maximum CPU

utilization, parallel query will not get any more out of the query. If no more CPU is available, optimizer will, based on

the resources available, make the decision to serialize the parallel operations.

The additional I/O requirements of the slave processes reading the data can also stress a heavily loaded I/O

subsystem. Distributing data across multiple disk spindles may help alleviate situations where disk I/O may become a

bottleneck. Additionally, queries may just run quicker serially. Typically, queries that use index lookups are not suited

for PQO.

Nested Loops vs. Hash/Sort Merge Joins Typically, parallel query tries to use full table

scan (FTS) to access data, while lookups are divided between slaves on a ROWID range

basis. Characteristically, nested loop joins are not really efficient with FTS. Hash joins and

sort merge options are usually much more efficient at processing large amounts of data.

However, there is a downside, as hash join and sort merge join do not allow row elimination

based on data from a driving row source. This elimination can drastically reduce the data

sets involved and could mean that a serial access path using index lookup is quicker than a

parallel operation simply due to the volume of data eliminated.

Data Splitting: Dividing the data up, passing it through multiple processes, and collating

the results may make the cost of slave generation greater than retrieving the data serially.

Expert Oracle RAC Performance Diagnostics and Tuning

Search WWH ::

Custom Search

Home