Databases Reference
In-Depth Information
history. For example, if the Profit_Fact table becomes too large, data older than two
years could be purged, while the Profit_by_Invoice_Date table can hold five years'
worth of data and still be very small. The purging of detailed data while holding on to
summary data requires an incremental update strategy.
The time available for updates is another constraint to consider. Often, an update
window is set aside during off hours, when the materialized views are updated in a ded-
icated disk. Determine the amount of time available for the update window, if applica-
ble. You may determine that an update window is not possible. Perhaps your organiza-
tion operates 24 hours a day, and the system cannot be made unavailable for any update
window. Perhaps the users need real-time summary data, and can't use data that was
computed last night. Incremental updates can often address these problems.
Instead of computing a view from scratch, incremental updates keep the majority
of the rows of a materialized view intact, and add new results as appropriate. For exam-
ple, consider the Profit_by_Invoice_Date table. Let's say the invoices are processed in a
daily cycle. Information is gathered for the jobs that have shipped since the previous
invoice cycle. Each customer that has jobs in the current cycle has an invoice assigned.
The customer's jobs are placed on the invoice. After the invoices are verified, the
invoices are printed and sent to the customers, and the status of each invoice is updated
in the database. As the cycle completes, the sell value for each job is known. The costs
for each job are also known, since the Job_Costing table is updated in real time and
work completes on each job before it is invoiced. We can calculate the daily totals and
add a single row to the Profit_by_Invoice_Date table. Updating the materialized views
in real time is sometimes referred to as a trickle feed .
5.5 Examples: The Good, the Bad, and the Ugly
Materialized views and star schemas are good approaches for reducing disk I/O, result-
ing in quicker query responses. Figure 5.6 is an example of a star schema, illustrating the
dimensional design approach. Typically, there are more dimensions in a star schema, but
the example is sufficient for discussion. The fact table is a materialized view derived
from the base tables. The materialized view amounts to stored calculations that reduce
the amount of disk I/O. The dimension tables provide a means to group data by various
levels along a variety of dimensions. The dimension tables add flexibility, multiplying
the gains of the fact table over a larger family of queries. The star schema and the
dimensional design approach can be good for improving query responses.
A question naturally arises. If precomputing saves time, then why not precompute
the joins between the fact table and the dimension tables? Wouldn't this save even more
time? The schema shown in Figure 5.9 is a possible consolidation of the fact table with
the dimension tables from Figure 5.6. Now we can query just one table, and we still
have the same ability to group by levels that we had with the dimension tables.
Let's rewrite the second query from Section 5.3. The query is now simpler:
Search WWH ::




Custom Search