Physical Data Warehouse Design - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

found in the Northwind data warehouse. In parent-child hierarchies, we

normally want to aggregate measures of child members into parent members.

For example, when considering an employee Andrew Fuller who is head

of the Northwind company (see Fig. 6.2 ), we may want to report not

only his sales amount but his sales amount plus the sum of the sales

amount of the employees that directly and indirectly report to him (Nancy

Davolio, Janet Leverling, and so on). Mondrian provides a special structure

called a closure table , which basically contains the transitive closure of

the hierarchy. This table has schema ( SupervisorKey , EmployeeKey , Distance ),

where the third attribute contains the distance from the two employees in

the hierarchy. For the hierarchy in Fig. 6.2 , the closure table will contain,

for example, the tuples ( 2 , 2 , 0 ), ( 2 , 5 , 1 ), and ( 2 , 6 , 2 ), which correspond

to ( Andrew Fuller , Andrew Fuller , 0 ), ( Andrew Fuller , Steven Buchanan , 1 ), and

( Andrew Fuller , Michael Suyama , 2 ). It follows from the above that a closure

table is similar to an aggregate table in the sense that it contains a redundant

copy of the data in the database. Note that while an aggregate table speeds

up aggregation, a closure table makes the computation of rollups along a

parent-child hierarchy more ecient.

When a query matches the definition of an aggregate table, Mondrian uses

such a table to answer the query instead of computing the aggregate from

scratch, basically applying the theoretical concepts studied in this chapter.

If more than one aggregate table matches a particular query, Mondrian must

choose between them. This is done as follows: If there is an aggregate table of

the same granularity as the query, Mondrian uses it. If there is no aggregate

table at the desired granularity, Mondrian picks an aggregate table of lower

granularity and rolls up from it. In general, Mondrian chooses the aggregate

table with the fewest rows (the heuristic we have called “smallest parent”).

7.10.2 Caching

Another feature provided by Mondrian to speed up query performance is

caching data in main memory, to avoid accessing the database to retrieve

schemas, dimension members, and facts. Mondrian provides three different

kinds of caches:

The schema cache , which keeps schemas in memory to avoid reading them

each time a cube is loaded. This cache stores the schema in memory after

it has been read for the first time and keeps it in memory until the cache

is cleared. Each time the schema is updated, the cache must be cleared.

The member cache , which stores dimension members in memory. The

member cache must also be synchronized with the underlying data. A

Service Provider Interface is used to flush the members from the cache.

The member cache is populated when members of a dimension are first

read, and then members are retrieved as needed. Like it is the case in

Search WWH ::

Custom Search

Home