Database Reference
In-Depth Information
Chapter 15
Data Partitioning
The amount of data stored in relational databases is growing exponentially every year. Customers are collecting more
data, and processing and retaining it for a longer amount of time. We, as database professionals, are working with
databases that have become larger over time.
From a development standpoint, database size is not that critical. Non-optimized queries time out regardless
of the database size. However, from a database administration standpoint, management of the large databases
introduces additional challenges. Data partitioning helps to address some of these challenges.
In this chapter, we will discuss the reasons we want to partition data, and we will cover the different techniques
of data partitioning what are available in SQL Server. We will focus on the practical implementation scenarios and
typical data partitioning use-cases in SQL Server.
Reasons to Partition Data
Let's assume that our system stores data in a large non-partitioned table. This approach dramatically simplifies
development. All data is in the same place, and you can read the data from and write the data to the same table.
With such a design, however, all of the data is stored in the same location. The table resides in a single filegroup,
which consists of one or multiple files stored on the same disk array. Even though, technically speaking, you could
spread indexes across different filegroups or data files across different disk arrays; it introduces additional database
management challenges, reduces the recoverability of data in case of a disaster, and rarely helps with performance.
At the same time, in almost every system, data, which is stored in large tables, can be separated into two different
categories: operational and historical . The first category consists of the data for the current operational period of the
company and handles most of the customers' requests in the table. Historical data, on the other hand, belongs to
the older operational periods, which the system must retain for various reasons, such as regulations and business
requirements, among others.
Most activity in the table is performed against operational data, even though it can be very small compared to
the total table size. Obviously, it would be beneficial to store operational data on the fast and expensive disk array.
Historical data, on the other hand, does not need such I/O performance.
When data is not partitioned, you cannot separate it between disk arrays. You either have to pay extra for the fast
storage you do not need, or compromise and buy larger but slower storage. Unfortunately, the latter option occurs
more often in the real world.
It is also common for operational and historical data to have different workloads. Operational data usually
supports OLTP, the customer-facing part of the system. Historical data is mainly used for analysis and reporting.
These two workloads produce different sets of queries, which would benefit from a different set of indexes.
Unfortunately, it is almost impossible to index a subset of the data in a table. Even though you can use filtered
indexes and/or indexed views, both approaches have several limitations. In most cases, you have to create a set of
indexes covering both workloads in the table scope. This requires additional storage space and introduces update
overhead for operational activity in the system. Moreover, volatile operational data requires different and more
frequent index maintenance as compared to static historical data, which is impossible to implement in such a case.
 
Search WWH ::




Custom Search