Database Reference
In-Depth Information
All this is very good advice. Nevertheless, there are situations in which the rules
surrounding the usage of surrogate keys should be relaxed or—to put it another
way—there can be times when it's useful to make the surrogate keys meaningful
instead of meaningless. Let's consider some of the times when this might be the case:
•
Date
: We can use a meaningless key as a surrogate key for the
Date
dimension. However, is there any point in doing so? In our opinion, the best
representation of a date surrogate key is an integer in the form YYYYMMDD,
so 20080109 represents January 9, 2008. Note that even the Kimball Group,
writing in the topic
The Microsoft Data Warehouse Toolkit
, accept that this can
be a good idea. The main reason for this is that it makes SQL queries that
filter by date much easier to write and much more readable; we very often
want to partition a measure group by date, for instance. The reason that it's
safe to do this is that the
Date
dimension will never change. You might add
some attributes to a
Date
dimension table and you might load new data into
it, but the data that is already there should never need to be altered.
All invalid dates may be easily represented with negative numbers, so -1
may be the unknown date, -2 may be the empty date, and so on. We will
have plenty of space for all the dummy dates we will ever need. A word
of warning about the type of the key: we sometimes face situations where
the
DateTime
type has been used for the key of the
Date
dimension. This is
absolutely the wrong thing to do, as not only is a
DateTime
representation
going to be bigger than the
INT
representation, the
DateTime
type does not
let us add dummy values to the dimension easily.
•
Ranges
: Suppose we want a dimension that will rank the sales based on
the amount of the specific sale. We want to analyze information based on a
range, not on each single amount.
If we define a
Range Amount
dimension and an
ID_RangeAmount
key in the
fact table, this will solve our modeling problem. However, what will happen
when the customer wants to change the ranges? We will have to re-compute
the whole fact table because the
ID_RangeAmount
key will become useless.
On the other hand, if you decide that $100 will be the granularity of the range
dimension, you can use
FLOOR (Amount/100)
as
ID_RangeAmount
and,
in doing so, you will be able to update the attributes of the
RangeAmount
dimension that will lead to hierarchies without updating the fact table. The
advantages of doing this are discussed in more detail in the following blog
entry:
http://tinyurl.com/rangedims
, and we discuss how to model this
in Analysis Services in
Chapter 4
,
Measures and Measure Groups.
Search WWH ::
Custom Search