Database Reference
In-Depth Information
Data updates are an even bigger problem still, both on facts and dimensions.
Data updates on fact tables suffer from the same problems as adding a new field.
Often, the number of rows that we need to update is so high that running even
simple SQL commands can take a very long time.
Data updates on dimensions can be a problem because they may require very
complex logic. Suppose we have a Type 2 SCD and that a record was entered into
the dimension table with incorrect attribute values. In this situation, we would have
created a new record and linked all the facts received after its creation to the new
(and incorrect) record. Recovering from this situation requires us to issue very precise
UPDATE statements to the relational database and to recalculate all the fact table rows
that depend—for any reason—on the incorrect record. Bad data in dimensions is not
very easy to spot, and sometimes several days—if not months—pass before someone
(in the worst case the user) discovers that something went wrong.
There is no good recipe for stopping bad data getting into your data warehouse.
When it happens, we need to be ready to spend a long time trying to recover from
the error. It's worth pointing out that data warehouses or data marts that are rebuilt
each night ("one shot databases") are not prone to this situation because, if bad data
is corrected, the entire data warehouse can be reloaded from scratch and the problem
fixed very quickly. This is one of the main advantages of "one shot" data warehouses,
although of course they do suffer from several disadvantages too, such as their
limited ability to hold historic data.
Natural and surrogate keys
In Kimball's view of a data mart, all the natural keys remain part of the dimension
although they are no longer the primary key, nor are they used to join to the fact.
Instead, you should use as table identifiers surrogate keys, that is, simple integer
values with the only scope to connect facts and dimension. This gives us complete
freedom in the data mart to add to or redefine a natural key's meaning and,
importantly, the usage of the smallest possible integer type for surrogate keys
will lead to a smaller fact table.
 
Search WWH ::




Custom Search