Designing the Data Warehouse for Analysis Services - Expert Cube Development with SSAS Multidimensional Models

Database Reference

In-Depth Information

Clearly, as the dimension and the fact tables are the same, the query will ask for

DISTINCT over a fact table; given that fact tables can be very large, the query might

take a long time to run. Nevertheless, if a degenerate dimension is needed and it is

stored in a fact table, then there is no other choice but to pay the price with this query.

Distinct Count measures

The last kind of query that we need to be aware of is when we have a measure group

containing a DISTINCT COUNT measure. In this case, due to the way Analysis Services

calculates distinct counts, the query to the fact table will be issued with ORDER BY for

the column we are performing the distinct count on.

Needless to say, this will lead to very poor performance because we are asking SQL

Server to sort a fact table on a column that is not part of the clustered index (usually,

the clustered index is built on the primary key). The pressure on the temporary

database will be tremendous and the query will take a lot of time.

There are some optimizations, mostly pertinent to partitioning, that need to be done

when we have DISTINCT COUNT measures in very big fact tables. What we want to

point out is that in this case a good knowledge of the internal behavior of Analysis

Services is necessary in order to avoid bad performance when processing.

Indexes in the data mart

The usage of indexes in data mart is a very complex topic and we cannot cover it all

in a simple section. Nevertheless, there are a few general rules that can be followed

for both fact and dimension tables.

Dimension tables

Dimension tables should have a primary clustered key based on an integer field,

which is the surrogate key.

Non-clustered indexes may be added for the natural key, in order to speed up the

ETL phase for Slowly Changing Dimensions. The key might be composed of the

natural key and the slowly changing dimension date of insertion. These indexes

might be defined as UNIQUE , but like any other constraint in the data mart, the

uniqueness should be enforced in development and disabled in production.

Fact tables

It is questionable whether fact tables should have a primary key or not. We prefer

to have a primary clustered key based on an integer field, because it makes it very

simple to identify a row in the case where we need to check for its value or update it.

Search WWH ::

Custom Search

Home