Data Warehouse Concepts - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

Distributive measures are defined by an aggregation function that can

be computed in a distributed way. Suppose that the data are partitioned

into n sets and that the aggregate function is applied to each set,

giving n aggregated values. The function is distributive if the result of

applying it to the whole data set is the same as the result of applying a

function (not necessarily the same) to the n aggregated values. The usual

aggregation functions such as the count, sum, minimum, and maximum are

distributive. However, the distinct count function is not. For instance, if we

partition the set of measure values

{

3 , 3 , 4 , 5 , 8 , 4 , 7 , 3 , 8

}

into the subsets

, summing up the result of the distinct count

function applied to each subset gives us a result of 8, while the answer over

the original set is 5.

Algebraic measures are defined by an aggregation function that can be

expressed as a scalar function of distributive ones. A typical example of an

algebraic aggregation function is the average, which can be computed by

dividing the sum by the count, the latter two functions being distributive.

Holistic measures are measures that cannot be computed from other

subaggregates. Typical examples include the median, the mode, and the

rank. Holistic measures are expensive to compute, especially when data

are modified, since they must be computed from scratch.

{

3 , 3 , 4

}

,

{

5 , 8 , 4

}

,and

{

7 , 3 , 8

}

3.2 OLAP Operations

As already said, a fundamental characteristic of the multidimensional model

is that it allows one to view data from multiple perspectives and at several

levels of detail. The OLAP operations allow these perspectives and levels of

detail to be materialized by exploiting the dimensions and their hierarchies,

thus providing an interactive data analysis environment.

Figure 3.4 presents a possible scenario that shows how an end user can

operate over a data cube in order to analyze data in different ways. Later

in this section, we present the OLAP operations in detail. Our user starts

from Fig. 3.4 a, a cube containing quarterly sales quantities (in thousands) by

product categories and customer cities for the year 2012.

The user first wants to compute the sales quantities by country. For this,

she applies a roll-up operation to the Country level along the Customer

dimension. The result is shown in Fig. 3.4 b. While the original cube contained

four values in the Customer dimension, one for each city, the new cube

contains two values, each one corresponding to one country. The remaining

dimensions are not affected. Thus, the values in cells pertaining to Paris and

Lyon in a given quarter and category contribute to the aggregation of the

corresponding values for France. The computation of the cells pertaining to

Germany proceeds analogously.

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home