Databases Reference
In-Depth Information
to reorganize the table with OPTIMIZE TABLE after loading a lot of data if you didn't
load the rows in primary key order.
• Updating the clustered index columns is expensive, because it forces InnoDB to
move each updated row to a new location.
• Tables built upon clustered indexes are subject to page splits when new rows are
inserted, or when a row's primary key is updated such that the row must be moved.
A page split happens when a row's key value dictates that the row must be placed
into a page that is full of data. The storage engine must split the page into two to
accommodate the row. Page splits can cause a table to use more space on disk.
• Clustered tables can be slower for full table scans, especially if rows are less densely
packed or stored nonsequentially because of page splits.
• Secondary (nonclustered) indexes can be larger than you might expect, because
their leaf nodes contain the primary key columns of the referenced rows.
• Secondary index accesses require two index lookups instead of one.
The last point can be a bit confusing. Why would a secondary index require two index
lookups? The answer lies in the nature of the “row pointers” the secondary index stores.
Remember, a leaf node doesn't store a pointer to the referenced row's physical location;
rather, it stores the row's primary key values.
That means that to find a row from a secondary index, the storage engine first finds the
leaf node in the secondary index and then uses the primary key values stored there to
navigate the primary key and find the row. That's double work: two B-Tree navigations
instead of one. 8 In InnoDB, the adaptive hash index can help reduce this penalty.
Comparison of InnoDB and MyISAM data layout
The differences between clustered and nonclustered data layouts, and the correspond-
ing differences between primary and secondary indexes, can be confusing and surpris-
ing. Let's see how InnoDB and MyISAM lay out the following table:
CREATE TABLE layout_test (
col1 int NOT NULL,
col2 int NOT NULL,
PRIMARY KEY(col1),
KEY(col2)
);
Suppose the table is populated with primary key values 1 to 10,000, inserted in random
order and then optimized with OPTIMIZE TABLE . In other words, the data is arranged
optimally on disk, but the rows might be in a random order. The values for col2 are
randomly assigned between 1 and 100, so there are lots of duplicates.
8. Nonclustered index designs aren't always able to provide single-operation row lookups, by the way. When
a row changes it might not fit in its original location anymore, so you might end up with fragmented rows
or “forwarding addresses” in the table, both of which would result in more work to find the row.
 
Search WWH ::




Custom Search