Database Reference
In-Depth Information
random I/O. By “scattered,” I mean that the index will tell us to read block 1, block 1,000, block 205, block 321, block 1,
block 1,032, block 1, and so on—it won't ask us to read block 1, then block 2, and then block 3 in a consecutive manner.
We will tend to read and reread blocks in a very haphazard fashion. This single block I/O can be very slow.
As a simplistic example of this, let's say we are reading that thin table via an index, and we are going to read
20 percent of the rows. Assume we have 100,000 rows in the table. Twenty percent of that is 20,000 rows. If the rows
are about 80 bytes apiece in size, on a database with an 8KB block size, we will find about 100 rows per block. That
means the table has approximately 1,000 blocks. From here, the math is very easy. We are going to read 20,000 rows
via the index; this will mean quite likely 20,000 TABLE ACCESS BY ROWID operations. We will process 20,000 table
blocks to execute this query. There are only about 1,000 blocks in the entire table, however! We would end up reading
and processing each block in the table on average 20 times. Even if we increased the size of the row by an order of
magnitude to 800 bytes per row, and 10 rows per block, we now have 10,000 blocks in the table. Index accesses for
20,000 rows would cause us to still read each block on average two times. In this case, a full table scan will be much
more efficient than using an index, as it has to touch each block only once. Any query that used this index to access
the data would not be very efficient until it accesses on average less than 5 percent of the data for the 800-byte column
(then we access about 5,000 blocks) and even less for the 80-byte column (about 0.5 percent or less).
Physical Organization
How the data is organized physically on disk deeply impacts these calculations, as it materially affects how expensive
(or inexpensive) index access will be. Suppose you have a table where the rows have a primary key populated by a
sequence. As data is added to the table, rows with sequential sequence numbers might be, in general, next to each other.
the use of features such as assM or multiple FREELIST / FREELIST GROUPS will affect how the data is organized
on disk. those features tend to spread the data out, and this natural clustering by primary key may not be observed.
Note
The table is naturally clustered in order by the primary key (since the data is added in more or less that order).
It will not be strictly clustered in order by the key, of course (we would have to use an IOT to achieve that); in general,
rows with primary keys that are close in value will be close together in physical proximity. When you issue the query
select * from T where primary_key between :x and :y
the rows you want are typically located on the same blocks. In this case, an index range scan may be useful even if it
accesses a large percentage of rows, simply because the database blocks that we need to read and reread will most
likely be cached since the data is co-located. On the other hand, if the rows are not co-located, using that same index
may be disastrous for performance. A small demonstration will drive this fact home. We'll start with a table that is
pretty much ordered by its primary key:
EODA@ORA12CR1> create table colocated ( x int, y varchar2(80) );
Table created.
EODA@ORA12CR1> begin
2 for i in 1 .. 100000
3 loop
4 insert into colocated(x,y)
5 values (i, rpad(dbms_random.random,75,'*') );
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed.
 
 
Search WWH ::




Custom Search