Declarative Query Processing in Relational Database Systems - Automated Physical Database Design and Tuning

Database Reference

In-Depth Information

over multiple columns of the underlying table. In that case, internal

nodes are defined on ranges of composite values from the index columns.

Querying indexes. Consider an index I

=

(

c 1 ,

c 2 ,... ,

c n )

. The two main

querying operations we can perform over such an index are Index Scans

and Index Seeks . Scanning an index locates the left-most leaf node in

the B + -tree and returns the values of the columns in the index without

actually going to the heap or clustered index. Since non-clustered in-

dexes are typically defined over a subset of the table columns, they are

smaller than the corresponding table heap or clustered index. Therefore,

Index Scans are good evaluation alternatives if the query requires only

a subset of columns present in the index pages. The second operator,

Index Seek , provides an ecient way to evaluate predicates of the form

c 1 = v 1 (or, generally, c 1 = v 1 ∧ ... ∧

T

c k + 1 ≤ v k + 1 ). We can evalu-

ate this predicate eciently by seeking the index, that is, traversing the

tree from the root to the first tuple in a leaf node that satisfies all predi-

cates (if there is such a tuple) and performing the analogous of an Index

Scan as previously described from that point until the index leaf page

where the predicate is no longer satisfied. Unlike binary search trees,

B + trees have very high fan-out (typically in the order of hundreds),

which reduces the number of I/O operations required to find an element

in the tree. Thus, a height of at most five suces for virtually all table

sizes. Therefore, seeking an index can be done using a handful of I/Os,

which is very ecient. Moreover, a large fraction of index internal nodes

might already be resident in memory. If a query requires columns that

are not part of the index definition, a subsequent lookup to the table

heap or clustered index is required. In that case, the RID of each tuple

is used to look up into the table heap (or an additional seek for the case

of clustered indexes). Since tuples in heaps or clustered indexes are not

necessarily laid out in the same order as tuples in secondary indexes,

such lookup typically requires an additional disk I/O, which can quickly

become expensive.

Updating indexes. Each time a tuple is modified, inserted, or deleted, all

relevant indexes are updated as well. For inserts and deletes on a table,

all indexes defined over such table are updated. For an update state-

ment, only those indexes that are defined over an update column are

modified. Thus, INSERT/DELETE/UPDATE statements are the main factor

against wider index adoption (the second drawback being storage con-

straints). An update-intensive query workload may suffer performance

degradation due to the presence of multiple indexes.

Key vs. included columns. Some database systems have the ability to dif-

ferentiate index columns into key columns and included columns . The

main difference between these two types of columns is that included

columns are not part of B + -tree internal nodes but are present in the

leaf nodes. Therefore, we cannot use an index to evaluate predicates that

c k = v k ∧

Automated Physical Database Design and Tuning

Search WWH ::

Custom Search

Home