Database Reference
In-Depth Information
All this is very good advice. Nevertheless, there are situations in which the rules
surrounding the usage of surrogate keys should be relaxed or—to put it another
way—there can be times when it's useful to make the surrogate keys meaningful
instead of meaningless. Let's consider some of the times when this might be the case:
Date : We can use a meaningless key as a surrogate key for the Date
dimension. However, is there any point in doing so? In our opinion, the best
representation of a date surrogate key is an integer in the form YYYYMMDD,
so 20080109 represents January 9, 2008. Note that even the Kimball Group,
writing in the topic The Microsoft Data Warehouse Toolkit , accept that this can
be a good idea. The main reason for this is that it makes SQL queries that
filter by date much easier to write and much more readable; we very often
want to partition a measure group by date, for instance. The reason that it's
safe to do this is that the Date dimension will never change. You might add
some attributes to a Date dimension table and you might load new data into
it, but the data that is already there should never need to be altered.
All invalid dates may be easily represented with negative numbers, so -1
may be the unknown date, -2 may be the empty date, and so on. We will
have plenty of space for all the dummy dates we will ever need. A word
of warning about the type of the key: we sometimes face situations where
the DateTime type has been used for the key of the Date dimension. This is
absolutely the wrong thing to do, as not only is a DateTime representation
going to be bigger than the INT representation, the DateTime type does not
let us add dummy values to the dimension easily.
Ranges : Suppose we want a dimension that will rank the sales based on
the amount of the specific sale. We want to analyze information based on a
range, not on each single amount.
If we define a Range Amount dimension and an ID_RangeAmount key in the
fact table, this will solve our modeling problem. However, what will happen
when the customer wants to change the ranges? We will have to re-compute
the whole fact table because the ID_RangeAmount key will become useless.
On the other hand, if you decide that $100 will be the granularity of the range
dimension, you can use FLOOR (Amount/100) as ID_RangeAmount and,
in doing so, you will be able to update the attributes of the RangeAmount
dimension that will lead to hierarchies without updating the fact table. The
advantages of doing this are discussed in more detail in the following blog
entry: http://tinyurl.com/rangedims , and we discuss how to model this
in Analysis Services in Chapter 4 , Measures and Measure Groups.
 
Search WWH ::




Custom Search