Designing the Data Warehouse for Analysis Services - Expert Cube Development with SSAS Multidimensional Models

Database Reference

In-Depth Information

All this is very good advice. Nevertheless, there are situations in which the rules

surrounding the usage of surrogate keys should be relaxed or—to put it another

way—there can be times when it's useful to make the surrogate keys meaningful

instead of meaningless. Let's consider some of the times when this might be the case:

•

Date : We can use a meaningless key as a surrogate key for the Date

dimension. However, is there any point in doing so? In our opinion, the best

representation of a date surrogate key is an integer in the form YYYYMMDD,

so 20080109 represents January 9, 2008. Note that even the Kimball Group,

writing in the topic The Microsoft Data Warehouse Toolkit , accept that this can

be a good idea. The main reason for this is that it makes SQL queries that

filter by date much easier to write and much more readable; we very often

want to partition a measure group by date, for instance. The reason that it's

safe to do this is that the Date dimension will never change. You might add

some attributes to a Date dimension table and you might load new data into

it, but the data that is already there should never need to be altered.

All invalid dates may be easily represented with negative numbers, so -1

may be the unknown date, -2 may be the empty date, and so on. We will

have plenty of space for all the dummy dates we will ever need. A word

of warning about the type of the key: we sometimes face situations where

the DateTime type has been used for the key of the Date dimension. This is

absolutely the wrong thing to do, as not only is a DateTime representation

going to be bigger than the INT representation, the DateTime type does not

let us add dummy values to the dimension easily.

•

Ranges : Suppose we want a dimension that will rank the sales based on

the amount of the specific sale. We want to analyze information based on a

range, not on each single amount.

If we define a Range Amount dimension and an ID_RangeAmount key in the

fact table, this will solve our modeling problem. However, what will happen

when the customer wants to change the ranges? We will have to re-compute

the whole fact table because the ID_RangeAmount key will become useless.

On the other hand, if you decide that $100 will be the granularity of the range

dimension, you can use FLOOR (Amount/100) as ID_RangeAmount and,

in doing so, you will be able to update the attributes of the RangeAmount

dimension that will lead to hierarchies without updating the fact table. The

advantages of doing this are discussed in more detail in the following blog

entry: http://tinyurl.com/rangedims , and we discuss how to model this

in Analysis Services in Chapter 4 , Measures and Measure Groups.

Search WWH ::

Custom Search

Home