Indexing for High Performance - High Performance MySQL

Databases Reference

In-Depth Information

to reorganize the table with OPTIMIZE TABLE after loading a lot of data if you didn't

load the rows in primary key order.

• Updating the clustered index columns is expensive, because it forces InnoDB to

move each updated row to a new location.

• Tables built upon clustered indexes are subject to page splits when new rows are

inserted, or when a row's primary key is updated such that the row must be moved.

A page split happens when a row's key value dictates that the row must be placed

into a page that is full of data. The storage engine must split the page into two to

accommodate the row. Page splits can cause a table to use more space on disk.

• Clustered tables can be slower for full table scans, especially if rows are less densely

packed or stored nonsequentially because of page splits.

• Secondary (nonclustered) indexes can be larger than you might expect, because

their leaf nodes contain the primary key columns of the referenced rows.

• Secondary index accesses require two index lookups instead of one.

The last point can be a bit confusing. Why would a secondary index require two index

lookups? The answer lies in the nature of the “row pointers” the secondary index stores.

Remember, a leaf node doesn't store a pointer to the referenced row's physical location;

rather, it stores the row's primary key values.

That means that to find a row from a secondary index, the storage engine first finds the

leaf node in the secondary index and then uses the primary key values stored there to

navigate the primary key and find the row. That's double work: two B-Tree navigations

instead of one. 8 In InnoDB, the adaptive hash index can help reduce this penalty.

Comparison of InnoDB and MyISAM data layout

The differences between clustered and nonclustered data layouts, and the correspond-

ing differences between primary and secondary indexes, can be confusing and surpris-

ing. Let's see how InnoDB and MyISAM lay out the following table:

CREATE TABLE layout_test (

col1 int NOT NULL,

col2 int NOT NULL,

PRIMARY KEY(col1),

KEY(col2)

);

Suppose the table is populated with primary key values 1 to 10,000, inserted in random

order and then optimized with OPTIMIZE TABLE . In other words, the data is arranged

optimally on disk, but the rows might be in a random order. The values for col2 are

randomly assigned between 1 and 100, so there are lots of duplicates.

8. Nonclustered index designs aren't always able to provide single-operation row lookups, by the way. When

a row changes it might not fit in its original location anymore, so you might end up with fragmented rows

or “forwarding addresses” in the table, both of which would result in more work to find the row.

Search WWH ::

Custom Search

Home