Indexes - Expert Oracle Database Architecture

Database Reference

In-Depth Information

random I/O. By “scattered,” I mean that the index will tell us to read block 1, block 1,000, block 205, block 321, block 1,

block 1,032, block 1, and so on—it won't ask us to read block 1, then block 2, and then block 3 in a consecutive manner.

We will tend to read and reread blocks in a very haphazard fashion. This single block I/O can be very slow.

As a simplistic example of this, let's say we are reading that thin table via an index, and we are going to read

20 percent of the rows. Assume we have 100,000 rows in the table. Twenty percent of that is 20,000 rows. If the rows

are about 80 bytes apiece in size, on a database with an 8KB block size, we will find about 100 rows per block. That

means the table has approximately 1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows

via the index; this will mean quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process 20,000 table

blocks to execute this query. There are only about 1,000 blocks in the entire table, however! We would end up reading

and processing each block in the table on average 20 times. Even if we increased the size of the row by an order of

magnitude to 800 bytes per row, and 10 rows per block, we now have 10,000 blocks in the table. Index accesses for

20,000 rows would cause us to still read each block on average two times. In this case, a full table scan will be much

more efficient than using an index, as it has to touch each block only once. Any query that used this index to access

the data would not be very efficient until it accesses on average less than 5 percent of the data for the 800-byte column

(then we access about 5,000 blocks) and even less for the 80-byte column (about 0.5 percent or less).

Physical Organization

How the data is organized physically on disk deeply impacts these calculations, as it materially affects how expensive

(or inexpensive) index access will be. Suppose you have a table where the rows have a primary key populated by a

sequence. As data is added to the table, rows with sequential sequence numbers might be, in general, next to each other.

■ the use of features such as assM or multiple FREELIST / FREELIST GROUPS will affect how the data is organized

on disk. those features tend to spread the data out, and this natural clustering by primary key may not be observed.

Note

The table is naturally clustered in order by the primary key (since the data is added in more or less that order).

It will not be strictly clustered in order by the key, of course (we would have to use an IOT to achieve that); in general,

rows with primary keys that are close in value will be close together in physical proximity. When you issue the query

select * from T where primary_key between :x and :y

the rows you want are typically located on the same blocks. In this case, an index range scan may be useful even if it

accesses a large percentage of rows, simply because the database blocks that we need to read and reread will most

likely be cached since the data is co-located. On the other hand, if the rows are not co-located, using that same index

may be disastrous for performance. A small demonstration will drive this fact home. We'll start with a table that is

pretty much ordered by its primary key:

EODA@ORA12CR1> create table colocated ( x int, y varchar2(80) );

Table created.

EODA@ORA12CR1> begin

2 for i in 1 .. 100000

3 loop

4 insert into colocated(x,y)

5 values (i, rpad(dbms_random.random,75,'*') );

6 end loop;

7 end;

8 /

PL/SQL procedure successfully completed.

Search WWH ::

Custom Search

Home