Database Reference
In-Depth Information
Changes do not happen very often. If they do, then we might be better off splitting
the attribute off into a separate dimension. If the changes happen rarely, then a
technique known as Slowly Changing Dimensions ( SCD ) is the solution and we
need to model this into our dimensions.
We can record SCDs using the following different techniques:
Type 1 : We maintain only the last value of each attribute in the dimension
table. If a customer changes address, then the previous one is lost and all
the previous facts will be shown as if the customer always lived at the same
address (new address).
Type 2 : We create a new record in the dimension table whenever a change
happens. All previous facts will still be linked to the old record. Thus, in our
customer address example, the old facts will be linked to the old address and
the new facts will be linked to the new address.
Type 3 : If what we want is simply to know the "last old value" of a specific
attribute of a dimension, we can add a field to the dimension table in order to
save just the "last value of the attribute" before updating it. In the real world,
this type of dimension is used very rarely.
The SCD type used is almost never the same across all the dimensions in a project.
We will normally end up with several dimensions of Type 1 and occasionally with
a couple of dimensions of Type 2. Also, not all the attributes of a dimension have to
have the same SCD behavior. History is not usually stored for the date of birth of a
customer, if it changes, since the chances are that the previous value was a mistake.
On the other hand, the address of the same customer is likely to be an SCD attribute
as it's likely we'll want to track changes for it. Finally, there may be the need to use
the same dimension with different slowly changing types in different cubes, as
shown. Handling these changes will inevitably make our ETL more complex:
Type 1 dimensions are relatively easy to handle and to manage. Each time we
detect a change, we apply it to the dimension table in the data mart and that
is all the work we need to do.
Type 2 dimensions are more complex. When we detect a change, we
invalidate the old record by setting its "end of validity date" and insert a new
record with the new values. As all the new data will refer to the new record,
it is simple to use in queries. We should have only one valid record for each
entity in the dimension.
The modeling of SCDs in Analysis Services will be covered later, but in this
theoretical discussion, it might be interesting to spend some time on the different
ways to model Type 2 SCDs in the relational data mart.
 
Search WWH ::




Custom Search