To Embed or Reference - MongoDB Applied Design Patterns

Database Reference

In-Depth Information

Figure 1-1. Disk seek versus sequential access

The end result of all this math? The seek takes well over 99% of the time spent reading a row.

When it comes to disk access, random seeks are the enemy. The reason why this is so import-

antinthiscontextisbecause JOIN stypically requirerandomseeks.Givenournormalized data

model, a likely plan for our query would be something similar to the following Python code:

for

for number_row iin find_by_contact_id ( numbers , 3 ):

yield

yield ( contact_row . name , number_row . number )

So there ends up being at least one disk seek for every contact in our database. Of course,

we've glossed over how find_by_contact_id works, assuming that all it needs to do is a

single disk seek. Typically, this is actually accomplished by reading an index on numbers that

is keyed by contact_id , potentially resulting in even more disk seeks.

Of course, modern database systems have evolved structures to mitigate some of this, largely

by caching frequently used objects (particularly indexes) in RAM. However, even with such

optimizations, joining tables is one of the most expensive operations that relational databases

do. Additionally, if you end up needing to scale your database to multiple servers, you intro-

duce the problem of generating a distributed join , a complex and generally slow operation.

Denormalizing for Performance

The dirty little secret (which isn't really so secret) about relational databases is that once we

have gone through the data modeling process to generate our nice n th normal form data mod-

el, it's often necessary to denormalize the model to reduce the number of JOIN operations re-

quired for the queries we execute frequently.

Search WWH ::

Custom Search

Home