Database Reference
In-Depth Information
Figure 1-1. Disk seek versus sequential access
The end result of all this math? The seek takes well over 99% of the time spent reading a row.
When it comes to disk access, random seeks are the enemy. The reason why this is so import-
antinthiscontextisbecause JOIN stypically requirerandomseeks.Givenournormalized data
model, a likely plan for our query would be something similar to the following Python code:
for
for number_row iin find_by_contact_id ( numbers , 3 ):
yield
yield ( contact_row . name , number_row . number )
So there ends up being at least one disk seek for every contact in our database. Of course,
we've glossed over how find_by_contact_id works, assuming that all it needs to do is a
single disk seek. Typically, this is actually accomplished by reading an index on numbers that
is keyed by contact_id , potentially resulting in even more disk seeks.
Of course, modern database systems have evolved structures to mitigate some of this, largely
by caching frequently used objects (particularly indexes) in RAM. However, even with such
optimizations, joining tables is one of the most expensive operations that relational databases
do. Additionally, if you end up needing to scale your database to multiple servers, you intro-
duce the problem of generating a distributed join , a complex and generally slow operation.
Denormalizing for Performance
The dirty little secret (which isn't really so secret) about relational databases is that once we
have gone through the data modeling process to generate our nice n th normal form data mod-
el, it's often necessary to denormalize the model to reduce the number of JOIN operations re-
quired for the queries we execute frequently.
Search WWH ::




Custom Search