Database Reference
In-Depth Information
Consider a bigint column as an example. When this column is defined as NOT NULL , the value fits into a single
CPU register and, therefore, operations on the value can be performed more quickly. Alternatively, a nullable bigint
column requires another 65th bit to indicate NULL values. When this is the case, SQL Server avoids cross-register data
storage by storing some of the row values (usually the highest or lowest values) in main memory using special markers
to indicate it in the data that resides in the CPU cache. As you can probably guess, this approach adds extra load
during execution. As a general rule, it is better to avoid nullable columns in Data Warehouse environments. It is also
beneficial to use CHECK constraints and UNIQUE constraints or indexes when overhead introduced by constraints or
unique indexes is acceptable.
Maintaining Statistics
Creating and maintaining statistics is a good practice that benefits any SQL Server system. As you know, up-to-date
statistics helps Query Optimizer generate more efficient execution plans.
Columnstore indexes behave differently than B-Tree indexes regarding statistics. SQL Server creates a statistics
object at the time of columnstore index creation; however, it is neither populated nor updated afterwards. SQL Server
relies on B-Tree indexes and column-level statistics while deciding if a columnstore index needs to be used.
It is beneficial to create missing column-level statistics on the columns that participate in a columnstore index
and are used in query predicates and as join keys.
Remember to update statistics, keeping them up-to-date after you load new data to a Data Warehouse. Statistics
rarely update automatically on very large tables.
Avoiding String Columns in Fact Tables
Generally, you should minimize the usage of string columns in facts tables. String data uses more space, and SQL
Server performs extra encoding when working with such data during batch-mode execution. Moreover, queries with
predicates on string columns may have less efficient execution plans as compared to their non-string counterparts.
SQL Server does not always push string predicates down towards the lowest operators in execution plans.
Let's look at an example of such behavior. The code shown in Listing 34-14 adds an ArticleCategory column
to the FactSales table, populating it with values from the DimArticles table. As a final step, the code recreates the
columnstore index, adding a new column there. Obviously, you should not design database schema this way, and you
should not keep redundant attributes in a facts table in production.
Listing 34-14. String columns in facts tables: Table schema changes
drop index IDX_FactSales_ColumnStore on dbo.FactSales
go
alter table dbo.FactSales
add ArticleCategory nvarchar(32) not null default ''
go
update t
set
t.ArticleCategory = a.ArticleCategory
from
dbo.FactSales t join dbo.DimArticles a on
t.ArticleId = a.ArticleId;
create nonclustered columnstore index IDX_FactSales_ColumnStore
on dbo.FactSales(DateId, ArticleId, BranchId, Quantity, UnitPrice, Amount, ArticleCategory);
 
Search WWH ::




Custom Search