NORMALIZING A DATA MODEL - A Developer's Guide to Data Modeling for SQL Server

Database Reference

In-Depth Information

is the process of reversing the steps taken to achieve a normal form. Often,

it becomes necessary to violate certain normalization rules to satisfy the

real-world requirements of specific queries. Let's look at some examples.

In data models that have a completely normalized structure, there

tend to be a great many entities and relationships. To retrieve logical sets

of data, you often need a great many joins to retrieve all the pertinent in-

formation about a given object. Logically this is not a problem, but in the

physical implementation of a database, joins tend to incur overhead in

query processing time. For every table that is joined, there is usually a cost

to scan the indexes on that table and then retrieve the matching data from

each object, combine the resulting data, and deliver it to the end user (for

more on indexes and query optimization, see Chapter 10).

When millions of rows are being scanned and tens or hundreds of rows

are being returned, it is costly. In these situations, creating a denormalized

entity may offer a performance benefit, at the cost of violating one of the

normal forms. The trade-off is usually a matter of having redundant data,

because you are storing an additional physical table that duplicates data

being stored in other tables. To mitigate the storage effects of this tech-

nique, you can often store subsets of data in the duplicate table, clearing it

out and repopulating it based on the queries you know are running against

it. Additionally, this means that you have additional physical objects to

maintain if there are schema changes in the original tables. In this case, ac-

curate documentation and a managed change control process are the only

practices that can ensure that all the relevant denormalized objects stay in

sync.

Denormalization also can help when you're working on reporting ap-

plications. In larger environments, it is often necessary to generate reports

based on application data. Reporting queries often return large historical

data sets, and when you join various types of data in a single report it in-

curs a lot of overhead on standard OLTP systems. Running these queries

on exactly the same databases that the applications are trying to use can re-

sult in an overloaded system, creating blocking situations and causing end

users to wait an unacceptable amount of time for the data. Additionally, it

means storing large amounts of historical data in the OLTP system, some-

thing that may have other adverse effects, both internally to the database

management system and to the physical server resources.

Denormalizing the data in the database to a set of tables (or even to a

different physical database) specifically used for reporting can alleviate the

Search WWH ::

Custom Search

Home