Database Reference
In-Depth Information
Design Considerations
The choice between columnstore and B-Tree indexes depends on several factors. The most important factor, however,
is the type of workload in the system. Both types of indexes are targeted to different use-cases, and each has its own
set of strength and weaknesses.
Columnstore indexes shine with Data Warehouse workloads when queries need to scan a large amount of data
in a table. However, they are not a good fit for cases where you need to select one or a handful of rows using Singleton
Lookup or small Range Scan operations. An Index Scan is the only access method supported by columnstore indexes,
and SQL Server will scan the data even if your query needs to select a single row from a table. The amount of data to
scan can be reduced by partition and segment eliminations. In either case, however, a scan would be far less efficient
than the use of a B-Tree Index Seek operation.
Most large Data Warehouse systems would benefit from columnstore indexes, even though their implementation
requires some work in order to get the most from them. You often need to change a database schema to fit into star
or snowflake patterns better and/or to normalize facts tables and remove string attributes from them. In the case of
SQL Server 2012, you need to change ETL processes to address the read-only limitation of nonclustered columnstore
indexes, and you must often refactor queries to utilize batch-mode execution.
Clustered columnstore indexes simplify the conversion process. You can continue to use existing ETL processes
and insert data directly into facts tables. There is a hidden danger in this approach, however. Even though clustered
columnstore indexes are fully updateable, they are optimized for large bulk load operations. As you have seen,
excessive data modifications and a large number of partially populated row groups could and will negatively affect the
performance of queries. In the end, you should either fine tune ETL processes or frequently rebuild indexes to avoid
such performance overhead. In some cases, especially with frequently modified or deleted data, you need to rebuild
indexes on a regular basis, regardless of the quality of the ETL processes.
Table partitioning becomes a must have element in this scenario. It allows you to perform index maintenance in
the partition scope, which can dramatically reduce the overhead of such operations. It also allows you to save storage
space and reduce the solution cost by implementing archival compression on the partitions that store old data.
The question of columnstore index usage in OLTP environments is more complex than it may seem. Even though
tables with clustered columnstore indexes are updateable, they are not good candidates for active and volatile OLTP
data. Unfortunately, performance issues are easy to overlook at the beginning of development; after all, any solution
performs good enough with a small amount of data. However, as the amount of data increases, performance issues
become noticeable and force the refactoring of systems.
Nevertheless, there are some legitimate cases for columnstore indexes even in OLTP environments. Almost
all OLTP systems provide some reporting and analysis capabilities to customers. In some cases, you may consider
using columnstore indexes on tables that store old and static historical data using regular B-Tree tables for volatile
operational data. You can combine data from all tables with a partitioned view, hiding the data layout from the client
applications.
One such example is a SaaS (Software as a Service) installation that collects some data and provides basic
Self-Service Business Intelligence to customers. Even though a Self-Service Business Intelligence workload fits into
the Data Warehouse profile, it is not always possible or practical to implement a separate Data Warehouse solution.
An implementation that uses data partitioning and utilizes columnstore indexes on historical data may work here.
Nevertheless, it will require a complex and thoughtful design process, deep knowledge of the system workload, and
considerable effort to implement.
In some cases, especially if data is static and read-only, nonclustered columnstore indexes could be a better
choice than clustered columnstore indexes. Even though they require extra storage space for B-Tree representation of
the data, you can benefit from regular B-Tree indexes to support some use-cases and queries.
Finally, it is worth remembering that columnstore indexes are an Enterprise Edition feature only. Moreover, they
are not a transparent feature, as is data compression and table partitioning, which can be removed from the database
relatively easily if necessary. Implementation of columnstore indexes leads to specific database schema and code
patterns, which can be less efficient in the case of B-Tree indexes. Think about over-normalization of facts tables,
changes in ETL processes, and batch-mode execution query refactoring as examples of those patterns.
 
Search WWH ::




Custom Search