Clustered Columnstore Indexes - Pro SQL Server Internals

Database Reference

In-Depth Information

Design Considerations

The choice between columnstore and B-Tree indexes depends on several factors. The most important factor, however,

is the type of workload in the system. Both types of indexes are targeted to different use-cases, and each has its own

set of strength and weaknesses.

Columnstore indexes shine with Data Warehouse workloads when queries need to scan a large amount of data

in a table. However, they are not a good fit for cases where you need to select one or a handful of rows using Singleton

Lookup or small Range Scan operations. An Index Scan is the only access method supported by columnstore indexes,

and SQL Server will scan the data even if your query needs to select a single row from a table. The amount of data to

scan can be reduced by partition and segment eliminations. In either case, however, a scan would be far less efficient

than the use of a B-Tree Index Seek operation.

Most large Data Warehouse systems would benefit from columnstore indexes, even though their implementation

requires some work in order to get the most from them. You often need to change a database schema to fit into star

or snowflake patterns better and/or to normalize facts tables and remove string attributes from them. In the case of

SQL Server 2012, you need to change ETL processes to address the read-only limitation of nonclustered columnstore

indexes, and you must often refactor queries to utilize batch-mode execution.

Clustered columnstore indexes simplify the conversion process. You can continue to use existing ETL processes

and insert data directly into facts tables. There is a hidden danger in this approach, however. Even though clustered

columnstore indexes are fully updateable, they are optimized for large bulk load operations. As you have seen,

excessive data modifications and a large number of partially populated row groups could and will negatively affect the

performance of queries. In the end, you should either fine tune ETL processes or frequently rebuild indexes to avoid

such performance overhead. In some cases, especially with frequently modified or deleted data, you need to rebuild

indexes on a regular basis, regardless of the quality of the ETL processes.

Table partitioning becomes a must have element in this scenario. It allows you to perform index maintenance in

the partition scope, which can dramatically reduce the overhead of such operations. It also allows you to save storage

space and reduce the solution cost by implementing archival compression on the partitions that store old data.

The question of columnstore index usage in OLTP environments is more complex than it may seem. Even though

tables with clustered columnstore indexes are updateable, they are not good candidates for active and volatile OLTP

data. Unfortunately, performance issues are easy to overlook at the beginning of development; after all, any solution

performs good enough with a small amount of data. However, as the amount of data increases, performance issues

become noticeable and force the refactoring of systems.

Nevertheless, there are some legitimate cases for columnstore indexes even in OLTP environments. Almost

all OLTP systems provide some reporting and analysis capabilities to customers. In some cases, you may consider

using columnstore indexes on tables that store old and static historical data using regular B-Tree tables for volatile

operational data. You can combine data from all tables with a partitioned view, hiding the data layout from the client

applications.

One such example is a SaaS (Software as a Service) installation that collects some data and provides basic

Self-Service Business Intelligence to customers. Even though a Self-Service Business Intelligence workload fits into

the Data Warehouse profile, it is not always possible or practical to implement a separate Data Warehouse solution.

An implementation that uses data partitioning and utilizes columnstore indexes on historical data may work here.

Nevertheless, it will require a complex and thoughtful design process, deep knowledge of the system workload, and

considerable effort to implement.

In some cases, especially if data is static and read-only, nonclustered columnstore indexes could be a better

choice than clustered columnstore indexes. Even though they require extra storage space for B-Tree representation of

the data, you can benefit from regular B-Tree indexes to support some use-cases and queries.

Finally, it is worth remembering that columnstore indexes are an Enterprise Edition feature only. Moreover, they

are not a transparent feature, as is data compression and table partitioning, which can be removed from the database

relatively easily if necessary. Implementation of columnstore indexes leads to specific database schema and code

patterns, which can be less efficient in the case of B-Tree indexes. Think about over-normalization of facts tables,

changes in ETL processes, and batch-mode execution query refactoring as examples of those patterns.

Pro SQL Server Internals

Search WWH ::

Custom Search

Home