Designing the Data Warehouse for Analysis Services - Expert Cube Development with SSAS Multidimensional Models

Database Reference

In-Depth Information

A snapshot is the recording of a specific situation at a point in time. If we record in

a fact table the total amount of sales for each specific product, we are not recording

an event, but a specific situation. Snapshots can also be useful when we want to

measure something not directly related to any other fact. If we want to rank out

customers based on sales or payments, for example, we may want to store snapshots

of this data in order to analyze how these rankings change over time in response to

marketing campaigns.

Using a snapshot table containing aggregated data instead of a transaction table

can drastically reduce the number of rows in our fact table, which in turn leads to

smaller cubes, faster cube processing, and faster querying. The price we pay for this

is the loss of any information that can only be stored at the transaction level and

cannot be aggregated up into the snapshot, such as the transaction number data we

encountered when discussing degenerate dimensions. Whether this is an acceptable

price to pay is a question only the end users can answer.

Updating fact and dimension tables

In an ideal world, data that is stored in the data warehouse would never change.

Some topics suggest that we should only support insert operations in a data

warehouse, not updates; data comes from the OLTP, is cleaned, and is then stored

in the data warehouse until the end of time, and should never change because it

represents the situation at the time of insertion.

Nevertheless, the real world is somewhat different to the ideal one. While some

updates are handled by the slowly changing dimension techniques already

discussed, there are other kinds of updates needed in the life of a data warehouse.

In our experience, these other types of update in the data warehouse are needed

fairly regularly and are of the following two main kinds:

•

Structural updates : When the data warehouse is up and running, we will

need to perform updates to add information such as new measures or new

dimension attributes. This is normal in the lifecycle of a BI solution.

•

Data updates : We need to update data that has already been loaded into

the data warehouse, because it is wrong. We need to delete the old data and

enter the new data, as the old data will inevitably lead to confusion. There

are many reasons why bad data comes to the data warehouse; the sad reality

is that bad data happens and we need to manage it gracefully.

Search WWH ::

Custom Search

Home