Data Partitioning - Pro SQL Server Internals

Database Reference

In-Depth Information

Chapter 15

Data Partitioning

The amount of data stored in relational databases is growing exponentially every year. Customers are collecting more

data, and processing and retaining it for a longer amount of time. We, as database professionals, are working with

databases that have become larger over time.

From a development standpoint, database size is not that critical. Non-optimized queries time out regardless

of the database size. However, from a database administration standpoint, management of the large databases

introduces additional challenges. Data partitioning helps to address some of these challenges.

In this chapter, we will discuss the reasons we want to partition data, and we will cover the different techniques

of data partitioning what are available in SQL Server. We will focus on the practical implementation scenarios and

typical data partitioning use-cases in SQL Server.

Reasons to Partition Data

Let's assume that our system stores data in a large non-partitioned table. This approach dramatically simplifies

development. All data is in the same place, and you can read the data from and write the data to the same table.

With such a design, however, all of the data is stored in the same location. The table resides in a single filegroup,

which consists of one or multiple files stored on the same disk array. Even though, technically speaking, you could

spread indexes across different filegroups or data files across different disk arrays; it introduces additional database

management challenges, reduces the recoverability of data in case of a disaster, and rarely helps with performance.

At the same time, in almost every system, data, which is stored in large tables, can be separated into two different

categories: operational and historical . The first category consists of the data for the current operational period of the

company and handles most of the customers' requests in the table. Historical data, on the other hand, belongs to

the older operational periods, which the system must retain for various reasons, such as regulations and business

requirements, among others.

Most activity in the table is performed against operational data, even though it can be very small compared to

the total table size. Obviously, it would be beneficial to store operational data on the fast and expensive disk array.

Historical data, on the other hand, does not need such I/O performance.

When data is not partitioned, you cannot separate it between disk arrays. You either have to pay extra for the fast

storage you do not need, or compromise and buy larger but slower storage. Unfortunately, the latter option occurs

more often in the real world.

It is also common for operational and historical data to have different workloads. Operational data usually

supports OLTP, the customer-facing part of the system. Historical data is mainly used for analysis and reporting.

These two workloads produce different sets of queries, which would benefit from a different set of indexes.

Unfortunately, it is almost impossible to index a subset of the data in a table. Even though you can use filtered

indexes and/or indexed views, both approaches have several limitations. In most cases, you have to create a set of

indexes covering both workloads in the table scope. This requires additional storage space and introduces update

overhead for operational activity in the system. Moreover, volatile operational data requires different and more

frequent index maintenance as compared to static historical data, which is impossible to implement in such a case.

Search WWH ::

Custom Search

Home