Introduction to Columnstore Indexes - Pro SQL Server Internals

Database Reference

In-Depth Information

The columns in the output represent the following:

column_id is the ID of a column in the index.

dictionary_id is ID of a dictionary.

version represents a dictionary format. Both SQL Server 2012 and 2014 return 1 as its value.

type represents the type of values stored in a dictionary. It can have one of the following

three values:

Dictionary contains int values is specified by type = 1

Dictionary contains string values is specified by type = 3

Dictionary contains float values is specified by type = 4

last_id is a last data id in a dictionary .

entry_count contains the number of entries in a dictionary .

on_disk_size indicates the size of a dictionary in bytes.

Design Considerations and Best Practices for

Columnstore Indexes

The subject of designing efficient Data Warehouse solutions is very broad and impossible to cover completely in this

book. However, it is equally impossible to avoid such discussion entirely.

Reducing Data Row Size

Regardless of the indexing technologies in use, most I/O activity in Data Warehouse systems is related to scanning

facts tables' data. Efficient design of facts tables is one of the key factors in Data Warehouse performance.

It is always advantageous to reduce the size of a data row, and it is even more critical in the case of facts tables

in Data Warehouses. By making data rows smaller, we reduce the size of the table on-disk and the number of I/O

operations during a scan. Moreover, it reduces the memory footprint of the data and makes batch-mode execution

more efficient due to better utilization of the internal CPU cache.

As you will remember, one of the key factors in reducing data size is the usage of correct data types for values.

You can think about storing Boolean values in int data types, or using datetime when a value requires just up to the

minute precision, as examples of bad design. Always use the smallest data type that can store column values and that

provides the required precision for the data.

Giving SQL Server as Much Information as Possible

Knowledge is power. The more SQL Server knows about the data, the better the chances that an efficient execution

plan is generated.

Unfortunately, nullability of columns is one of the most obvious but frequently overlooked factors. Defining

columns as NOT NULL when appropriate helps Query Optimizer and, in some cases, reduces the storage space

required for the data. It also allows SQL Server to avoid unnecessary encoding in columnstore indexes and during

batch-mode execution.

Search WWH ::

Custom Search

Home