Databases Reference
In-Depth Information
for any given query. You can force the optimizer to choose one method or the other
with the SQL_BIG_RESULT and SQL_SMALL_RESULT optimizer hints, as discussed earlier in
this chapter.
If you need to group a join by a value that comes from a lookup table, it's usually more
efficient to group by the lookup table's identifier than by the value. For example, the
following query isn't as efficient as it could be:
mysql> SELECT actor.first_name, actor.last_name, COUNT(*)
-> FROM sakila.film_actor
-> INNER JOIN sakila.actor USING(actor_id)
-> GROUP BY actor.first_name, actor.last_name;
The query is more efficiently written as follows:
mysql> SELECT actor.first_name, actor.last_name, COUNT(*)
-> FROM sakila.film_actor
-> INNER JOIN sakila.actor USING(actor_id)
-> GROUP BY film_actor.actor_id;
Grouping by actor.actor_id could be even more efficient than grouping by film_
actor.actor_id . You should test on your specific data to see.
This query takes advantage of the fact that the actor's first and last name are dependent
on the actor_id , so it will return the same results, but it's not always the case that you
can blithely select nongrouped columns and get the same result. You might even have
the server's SQL_MODE configured to disallow it. You can use MIN() or MAX() to work
around this when you know the values within the group are distinct because they de-
pend on the grouped-by column, or if you don't care which value you get:
mysql> SELECT MIN(actor.first_name), MAX(actor.last_name), ...;
Purists will argue that you're grouping by the wrong thing, and they're right. A spurious
MIN() or MAX() is a sign that the query isn't structured correctly. However, sometimes
your only concern will be making MySQL execute the query as quickly as possible. The
purists will be satisfied with the following way of writing the query:
mysql> SELECT actor.first_name, actor.last_name, c.cnt
-> FROM sakila.actor
-> INNER JOIN (
-> SELECT actor_id, COUNT(*) AS cnt
-> FROM sakila.film_actor
-> GROUP BY actor_id
-> ) AS c USING(actor_id) ;
But the cost of creating and filling the temporary table required for the subquery may
be high compared to the cost of fudging pure relational theory a little bit. Remember,
the temporary table created by the subquery has no indexes. 17
It's generally a bad idea to select nongrouped columns in a grouped query, because the
results will be nondeterministic and could easily change if you change an index or the
17. This is another limitation that's fixed in MariaDB, by the way.
 
Search WWH ::




Custom Search