Databases Reference
In-Depth Information
This query works by eliminating any films that have actors. Each film might have
many actors, but as soon as it finds one actor, it stops processing the current film
and moves to the next one because it knows the WHERE clause prohibits outputting
that film. A similar “Distinct/not-exists” optimization can apply to certain kinds
of DISTINCT , NOT EXISTS() , and LEFT JOIN queries.
Equality propagation
MySQL recognizes when a query holds two columns as equal—for example, in a
JOIN condition—and propagates WHERE clauses across equivalent columns. For in-
stance, in the following query:
mysql> SELECT film.film_id
-> FROM sakila.film
-> INNER JOIN sakila.film_actor USING(film_id)
-> WHERE film.film_id > 500;
MySQL knows that the WHERE clause applies not only to the film table but to the
film_actor table as well, because the USING clause forces the two columns to match.
If you're used to another database server that can't do this, you might have been
advised to “help the optimizer” by manually specifying the WHERE clause for both
tables, like this:
... WHERE film.film_id > 500 AND film_actor.film_id > 500
This is unnecessary in MySQL. It just makes your queries harder to maintain.
IN() list comparisons
In many database servers, IN() is just a synonym for multiple OR clauses, because
the two are logically equivalent. Not so in MySQL, which sorts the values in the
IN() list and uses a fast binary search to see whether a value is in the list. This is
O(log n ) in the size of the list, whereas an equivalent series of OR clauses is O( n ) in
the size of the list (i.e., much slower for large lists).
The preceding list is woefully incomplete, because MySQL performs more optimiza-
tions than we could fit into this entire chapter, but it should give you an idea of the
optimizer's complexity and intelligence. If there's one thing you should take away from
this discussion, it's don't try to outsmart the optimizer . You might end up just defeating
it, or making your queries more complicated and harder to maintain for zero benefit.
In general, you should let the optimizer do its work.
Of course, as smart as the optimizer is, there are times when it doesn't give the best
result. Sometimes you might know something about the data that the optimizer doesn't,
such as a fact that's guaranteed to be true because of application logic. Also, sometimes
the optimizer doesn't have the necessary functionality, such as hash indexes; at other
times, as mentioned earlier, its cost estimates might prefer a query plan that turns out
to be more expensive than an alternative.
 
Search WWH ::




Custom Search