Database Reference
In-Depth Information
Now, how do these kinds of updates interact with fact and dimension tables?
Let's summarize briefly what the physical distinctions between fact and dimension
tables are:
Dimension tables are normally small, usually with less than 1 million rows,
and very frequently much less than that.
Fact tables are often very large; they can have up to hundreds of millions
or even billions of rows. Fact tables may be partitioned, and loading data
into them is usually the most time-consuming operation in the whole of the
data warehouse.
Structural updates on dimension tables are very easy to make, provided that they
can be incorporated in the existing SCD type without affecting granularity. You
simply update the table with the new metadata, make the necessary changes to your
ETL procedures, and the next time they are run the dimension will reflect the new
values. If your users decide that they want to analyze data based on a new attribute
on, say, the Customer dimension, then the new attribute can be added for all of
the customers in the dimension. Moreover, if the attribute is not present for some
customers, then they can be assigned a default value; after all, updating one million
rows is not a difficult task for SQL Server or any other modern relational database.
However, even if updating the relational model is simple, the updates need to go
through to Analysis Services and this might result in the need for a full process of the
dimension and therefore the cube, which might be very time consuming.
On the other hand, structural updates may be a huge problem on fact tables. The
problem is not that of altering the metadata, but determining and assigning a default
value for the large number of rows that are already stored in the fact table. It's easy
to insert data into fact tables. However, creating a new field with a default value
would result in an UPDATE command that will probably run for hours and might
even bring down your database server. Worse, if we do not have a simple default
value to assign, then we will need to calculate the new value for each row in the
fact table, and so the update operation will take even longer. We have found that it
is often better to reload the entire fact table rather than perform an update on it. Of
course, in order to reload the fact table, you need to have all of your source data at
hand and this is not always possible.
 
Search WWH ::




Custom Search