Database Reference
In-Depth Information
Clearly, as the dimension and the fact tables are the same, the query will ask for
DISTINCT over a fact table; given that fact tables can be very large, the query might
take a long time to run. Nevertheless, if a degenerate dimension is needed and it is
stored in a fact table, then there is no other choice but to pay the price with this query.
Distinct Count measures
The last kind of query that we need to be aware of is when we have a measure group
containing a DISTINCT COUNT measure. In this case, due to the way Analysis Services
calculates distinct counts, the query to the fact table will be issued with ORDER BY for
the column we are performing the distinct count on.
Needless to say, this will lead to very poor performance because we are asking SQL
Server to sort a fact table on a column that is not part of the clustered index (usually,
the clustered index is built on the primary key). The pressure on the temporary
database will be tremendous and the query will take a lot of time.
There are some optimizations, mostly pertinent to partitioning, that need to be done
when we have DISTINCT COUNT measures in very big fact tables. What we want to
point out is that in this case a good knowledge of the internal behavior of Analysis
Services is necessary in order to avoid bad performance when processing.
Indexes in the data mart
The usage of indexes in data mart is a very complex topic and we cannot cover it all
in a simple section. Nevertheless, there are a few general rules that can be followed
for both fact and dimension tables.
Dimension tables
Dimension tables should have a primary clustered key based on an integer field,
which is the surrogate key.
Non-clustered indexes may be added for the natural key, in order to speed up the
ETL phase for Slowly Changing Dimensions. The key might be composed of the
natural key and the slowly changing dimension date of insertion. These indexes
might be defined as UNIQUE , but like any other constraint in the data mart, the
uniqueness should be enforced in development and disabled in production.
Fact tables
It is questionable whether fact tables should have a primary key or not. We prefer
to have a primary clustered key based on an integer field, because it makes it very
simple to identify a row in the case where we need to check for its value or update it.
 
Search WWH ::




Custom Search