Database Reference
In-Depth Information
relational database systems, if the member is in memory, it does not need
to be read from the database. Members are specific values in a dimension
(like Customer.Country.France ), and they include the root and the children.
We have to give the complete path to the member since a member is not
simply a value, but a value and an associated dimension level, since a
member can have the same name for a given level, for different paths
within the hierarchy.
￿ The segment cache , which holds data from the fact table (usually the
largest table in a warehouse) and contains aggregated data, reducing
the number of calculations to perform. The segment is associated with
a measure, for example, Sales Amount , and also contains a set of
predicates separated by commas (e.g., [CityName = Paris] , [CategoryName
= Beverages] ) and a list of measure values associated with these predicates
(e.g., the list of sales amounts for beverages in Paris: [120, 259, ...] ).
With these values in the cache, aggregations can be easily computed when
a query includes the predicates in the cache. The segment cache can be
internal , where the segments are stored in local memory, or external ,where
the segments are stored in a data grid, which increases the amount of data
stored in memory by adding additional servers.
Mondrian automatically updates the caches as schemas and dimensions
are read and aggregates are calculated. As usual in caching techniques,
the first user to access the data is the one that populates the cache
rather than getting benefits of it. However, there are techniques that
populate the cache in advance, so it will be ready to benefit users from
the start. This is called precaching. Normally, in Mondrian, XML for
Analysis (XMLA) web service calls are used for this task (recall from
Chap. 6 that XMLA is a SOAP-based standard for making web service
calls).
When data sources change, the cache gets outdated with respect to the
actual data, and the cache must be flushed . When the schema cache is flushed,
its associated member and segment caches are also flushed. Most tools that
use Mondrian, like Pentaho, provide a way to manually flush the cache.
Pentaho provides the Enterprise Console or User Console for this. A more
ecient approach is to automate cache flushing by including this task as part
of the ETL process.
7.11 Summary
In this chapter, we studied the problem of physical data warehouse design. We
focused on three techniques: view materialization, indexing, and partitioning.
For the former, we studied the problem of incremental view maintenance, that
is, how and when a view can be updated without recomputing it from scratch.
In addition, we presented algorithms that compute eciently the data cube
Search WWH ::




Custom Search