Databases Reference
In-Depth Information
“summary tables,” we mean tables that hold aggregated data from GROUP BY queries
(i.e., data that is not logically redundant). Some people also use the term “roll-up tables”
for these tables, because the data has been “rolled up.”
Staying with the website example, suppose you need to count the number of messages
posted during the previous 24 hours. It would be impossible to maintain an accurate
real-time counter on a busy site. Instead, you could generate a summary table every
hour. You can often do this with a single query, and it's more efficient than maintaining
counters in real time. The drawback is that the counts are not 100% accurate.
If you need to get an accurate count of messages posted during the previous 24-hour
period (with no staleness), there is another option. Begin with a per-hour summary
table. You can then count the exact number of messages posted in a given 24-hour
period by adding the number of messages in the 23 whole hours contained in that
period, the partial hour at the beginning of the period, and the partial hour at the end
of the period. Suppose your summary table is called msg_per_hr and is defined
as follows:
CREATE TABLE msg_per_hr (
hr DATETIME NOT NULL,
cnt INT UNSIGNED NOT NULL,
PRIMARY KEY(hr)
);
You can find the number of messages posted in the previous 24 hours by adding the
results of the following three queries. We're using LEFT(NOW(), 14) to round the current
date and time to the nearest hour:
mysql> SELECT SUM(cnt) FROM msg_per_hr
-> WHERE hr BETWEEN
-> CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 23 HOUR
-> AND CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 1 HOUR;
mysql> SELECT COUNT(*) FROM message
-> WHERE posted >= NOW() - INTERVAL 24 HOUR
-> AND posted < CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 23 HOUR;
mysql> SELECT COUNT(*) FROM message
-> WHERE posted >= CONCAT(LEFT(NOW(), 14), '00:00');
Either approach—an inexact count or an exact count with small range queries to fill
in the gaps—is more efficient than counting all the rows in the message table. This is
the key reason for creating summary tables. These statistics are expensive to compute
in real time, because they require scanning a lot of data, or queries that will only run
efficiently with special indexes that you don't want to add because of the impact they
will have on updates. Computing the most active users or the most frequent “tags” are
typical examples of such operations.
Cache tables, in turn, are useful for optimizing search and retrieval queries. These
queries often require a particular table and index structure that is different from the
one you would use for general online transaction processing (OLTP) operations.
 
Search WWH ::




Custom Search