Database Reference
In-Depth Information
A snapshot is the recording of a specific situation at a point in time. If we record in
a fact table the total amount of sales for each specific product, we are not recording
an event, but a specific situation. Snapshots can also be useful when we want to
measure something not directly related to any other fact. If we want to rank out
customers based on sales or payments, for example, we may want to store snapshots
of this data in order to analyze how these rankings change over time in response to
marketing campaigns.
Using a snapshot table containing aggregated data instead of a transaction table
can drastically reduce the number of rows in our fact table, which in turn leads to
smaller cubes, faster cube processing, and faster querying. The price we pay for this
is the loss of any information that can only be stored at the transaction level and
cannot be aggregated up into the snapshot, such as the transaction number data we
encountered when discussing degenerate dimensions. Whether this is an acceptable
price to pay is a question only the end users can answer.
Updating fact and dimension tables
In an ideal world, data that is stored in the data warehouse would never change.
Some topics suggest that we should only support insert operations in a data
warehouse, not updates; data comes from the OLTP, is cleaned, and is then stored
in the data warehouse until the end of time, and should never change because it
represents the situation at the time of insertion.
Nevertheless, the real world is somewhat different to the ideal one. While some
updates are handled by the slowly changing dimension techniques already
discussed, there are other kinds of updates needed in the life of a data warehouse.
In our experience, these other types of update in the data warehouse are needed
fairly regularly and are of the following two main kinds:
Structural updates : When the data warehouse is up and running, we will
need to perform updates to add information such as new measures or new
dimension attributes. This is normal in the lifecycle of a BI solution.
Data updates : We need to update data that has already been loaded into
the data warehouse, because it is wrong. We need to delete the old data and
enter the new data, as the old data will inevitably lead to confusion. There
are many reasons why bad data comes to the data warehouse; the sad reality
is that bad data happens and we need to manage it gracefully.
 
Search WWH ::




Custom Search