Introduction to Columnstore Indexes - Pro SQL Server Internals

Database Reference

In-Depth Information

Consider a bigint column as an example. When this column is defined as NOT NULL , the value fits into a single

CPU register and, therefore, operations on the value can be performed more quickly. Alternatively, a nullable bigint

column requires another 65th bit to indicate NULL values. When this is the case, SQL Server avoids cross-register data

storage by storing some of the row values (usually the highest or lowest values) in main memory using special markers

to indicate it in the data that resides in the CPU cache. As you can probably guess, this approach adds extra load

during execution. As a general rule, it is better to avoid nullable columns in Data Warehouse environments. It is also

beneficial to use CHECK constraints and UNIQUE constraints or indexes when overhead introduced by constraints or

unique indexes is acceptable.

Maintaining Statistics

Creating and maintaining statistics is a good practice that benefits any SQL Server system. As you know, up-to-date

statistics helps Query Optimizer generate more efficient execution plans.

Columnstore indexes behave differently than B-Tree indexes regarding statistics. SQL Server creates a statistics

object at the time of columnstore index creation; however, it is neither populated nor updated afterwards. SQL Server

relies on B-Tree indexes and column-level statistics while deciding if a columnstore index needs to be used.

It is beneficial to create missing column-level statistics on the columns that participate in a columnstore index

and are used in query predicates and as join keys.

Remember to update statistics, keeping them up-to-date after you load new data to a Data Warehouse. Statistics

rarely update automatically on very large tables.

Avoiding String Columns in Fact Tables

Generally, you should minimize the usage of string columns in facts tables. String data uses more space, and SQL

Server performs extra encoding when working with such data during batch-mode execution. Moreover, queries with

predicates on string columns may have less efficient execution plans as compared to their non-string counterparts.

SQL Server does not always push string predicates down towards the lowest operators in execution plans.

Let's look at an example of such behavior. The code shown in Listing 34-14 adds an ArticleCategory column

to the FactSales table, populating it with values from the DimArticles table. As a final step, the code recreates the

columnstore index, adding a new column there. Obviously, you should not design database schema this way, and you

should not keep redundant attributes in a facts table in production.

Listing 34-14. String columns in facts tables: Table schema changes

drop index IDX_FactSales_ColumnStore on dbo.FactSales

go

alter table dbo.FactSales

add ArticleCategory nvarchar(32) not null default ''

go

update t

set

t.ArticleCategory = a.ArticleCategory

from

dbo.FactSales t join dbo.DimArticles a on

t.ArticleId = a.ArticleId;

create nonclustered columnstore index IDX_FactSales_ColumnStore

on dbo.FactSales(DateId, ArticleId, BranchId, Quantity, UnitPrice, Amount, ArticleCategory);

Search WWH ::

Custom Search

Home