Designing the Data Warehouse for Analysis Services - Expert Cube Development with SSAS Multidimensional Models

Database Reference

In-Depth Information

Now, how do these kinds of updates interact with fact and dimension tables?

Let's summarize briefly what the physical distinctions between fact and dimension

tables are:

•

Dimension tables are normally small, usually with less than 1 million rows,

and very frequently much less than that.

•

Fact tables are often very large; they can have up to hundreds of millions

or even billions of rows. Fact tables may be partitioned, and loading data

into them is usually the most time-consuming operation in the whole of the

data warehouse.

Structural updates on dimension tables are very easy to make, provided that they

can be incorporated in the existing SCD type without affecting granularity. You

simply update the table with the new metadata, make the necessary changes to your

ETL procedures, and the next time they are run the dimension will reflect the new

values. If your users decide that they want to analyze data based on a new attribute

on, say, the Customer dimension, then the new attribute can be added for all of

the customers in the dimension. Moreover, if the attribute is not present for some

customers, then they can be assigned a default value; after all, updating one million

rows is not a difficult task for SQL Server or any other modern relational database.

However, even if updating the relational model is simple, the updates need to go

through to Analysis Services and this might result in the need for a full process of the

dimension and therefore the cube, which might be very time consuming.

On the other hand, structural updates may be a huge problem on fact tables. The

problem is not that of altering the metadata, but determining and assigning a default

value for the large number of rows that are already stored in the fact table. It's easy

to insert data into fact tables. However, creating a new field with a default value

would result in an UPDATE command that will probably run for hours and might

even bring down your database server. Worse, if we do not have a simple default

value to assign, then we will need to calculate the new value for each row in the

fact table, and so the update operation will take even longer. We have found that it

is often better to reload the entire fact table rather than perform an update on it. Of

course, in order to reload the fact table, you need to have all of your source data at

hand and this is not always possible.

Search WWH ::

Custom Search

Home