Database Reference
In-Depth Information
The columns in the output represent the following:
column_id
is the ID of a column in the index.
dictionary_id
is ID of a dictionary.
version
represents a dictionary format. Both SQL Server 2012 and 2014 return 1 as its value.
type
represents the type of values stored in a dictionary. It can have one of the following
three values:
Dictionary contains int values is specified by
type = 1
Dictionary contains string values is specified by
type = 3
Dictionary contains float values is specified by
type = 4
last_id
is a last data id in a dictionary
.
entry_count
contains the number of entries in a dictionary
.
on_disk_size
indicates the size of a dictionary in bytes.
Design Considerations and Best Practices for
Columnstore Indexes
The subject of designing efficient Data Warehouse solutions is very broad and impossible to cover completely in this
book. However, it is equally impossible to avoid such discussion entirely.
Reducing Data Row Size
Regardless of the indexing technologies in use, most I/O activity in Data Warehouse systems is related to scanning
facts tables' data. Efficient design of facts tables is one of the key factors in Data Warehouse performance.
It is always advantageous to reduce the size of a data row, and it is even more critical in the case of facts tables
in Data Warehouses. By making data rows smaller, we reduce the size of the table on-disk and the number of I/O
operations during a scan. Moreover, it reduces the memory footprint of the data and makes batch-mode execution
more efficient due to better utilization of the internal CPU cache.
As you will remember, one of the key factors in reducing data size is the usage of correct data types for values.
You can think about storing Boolean values in
int
data types, or using
datetime
when a value requires just
up to the
minute
precision, as examples of bad design. Always use the smallest data type that can store column values and that
provides the required precision for the data.
Giving SQL Server as Much Information as Possible
Knowledge is power. The more SQL Server knows about the data, the better the chances that an efficient execution
plan is generated.
Unfortunately, nullability of columns is one of the most obvious but frequently overlooked factors. Defining
columns as
NOT NULL
when appropriate helps Query Optimizer and, in some cases, reduces the storage space
required for the data. It also allows SQL Server to avoid unnecessary encoding in columnstore indexes and during
batch-mode execution.