Parallel Query Tuning - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

Prior to Oracle Database 11g Release 2, parallel query slaves bypassed the buffer cache and performed a direct

path I/O to read data from disk into the PX server private workspace section. Starting with Oracle Database 11g

Release 2, Oracle has enhanced the parallel query architecture significantly. Taking advantage of the advancement in

memory management in servers and the large amount of memory available on the servers, parallel queries leverage

the aggregated database buffer cache feature available in RAC to cache objects of very large sizes. Starting with Oracle

Database 11g Release 2, with large amount of data stored in memory, Oracle now uses the in-memory PX feature.

With in-memory PX, when a SQL statement is issued in parallel, a check is conducted to determine if the objects

accessed by the statement should be cached in the aggregated buffer cache of the system. The decision to use the

aggregated buffer cache is based on the size of the object, the frequency at which the object changes, the frequency at

which the object is accessed, and the size of the aggregated buffer cache. If these conditions are met, the object will be

broken into smaller fragments and distributed to all participating nodes in the cluster and stored in their respective

buffer caches. The size of the fragment depends on the type of data distribution strategy used to store the data. For

example, if the database uses features such as hash partitioning, then each partition would be considered a fragment;

if not, fragment sizes are based on the file number and extent number of the database object. 4

Fragments mapped to a specific node in the cluster create instance affinity to the buffer cache of that instance.

This means if data from these fragments are required, the PX servers on the nodes where the data resides will access

the data in its buffer cache and return only the result to the instance where the statement was issued (where the QC is

running) and not all the data required by the query. This new architecture reduces interconnect traffic significantly.

The output from event 43905 at level 124 also illustrates the slave distribution among the various instances in the

RAC cluster. In the following output, there are three instances of Oracle and 96 slaves have to be distributed among

the instances. As illustrated in the highlighted sections of the output, the optimizer decides to equally spread the

slaves between the three instances in the cluster.

In the following output, the kxfpiinfo step illustrates that the optimizer has taken into consideration the number

of CPUs available on each sever in the cluster and then the kxfpgsg step of the output allocates 32 slaves per instance

(32*3), for a total of 96 slaves:

kxfrialo

threads requested = 96 (from kxfrComputeThread())

kxfrialo

adjusted no. threads = 96 (from kxfrAdjustDOP())

kxfrialo

Start: allocating requested 96 slaves

kxfrAllocSlaves

DOP trace -- call kxfpgsg to get 96 slaves

num server requested = 96 load balancing:off adaptive:off

kxfplist

Getting instance info for open group

kxfpiinfo

inst[cpus:mxslv]

1[16:320] 2[16:320] 3[16:320]

kxfpsori

Sorted: 2(2:2) 1(1:1) 3(0:0)

kxfpgsg

getting 2 sets of 96 threads, client parallel query execution flg=0x0

Height=0, Affinity List Size=0, inst_total=3, coord=1

Insts 2 1 3

Threads 32 32 32

4 “Parallel Execution Fundamentals in Oracle Database 11g Release 2,” an Oracle White Paper, November 2009.

Expert Oracle RAC Performance Diagnostics and Tuning

Search WWH ::

Custom Search

Home