Database Reference
In-Depth Information
Similarly, overnormalization is not good for query performance. Overnormalization causes excessive joins across
too many narrow tables. Misestimations on cardinality in one table can seriously impact a large number of others as
they get joined. Although a 20-table join can perform perfectly fine and a 2-table join can be a problem, a good rule
of thumb is to more closely examine a query when it exceeds 8 to 12 tables in the join criteria. That is not to say that
anything below that number is good and anything above that is bad; however, this number of joins should act as a
flag for evaluation. To fetch any useful content from the database, a database developer has to join a large number of
tables in the SQL queries. For example, if you create separate tables for a customer name, address, and phone number,
then you will have to join at least three tables to retrieve the customer information. If the data (for example, the
customer name and address) has a one-to-one type of relationship and is usually accessed together by the queries,
then normalizing the data into separate tables can hurt query performance.
Benefiting from Entity-Integrity Constraints
Data integrity is essential to ensuring the quality of data in the database. An essential component of data integrity
is entity integrity, which defines a row as a unique entity for a particular table; that is, every row in a table must be
uniquely identifiable. The column or columns serving as the unique row identifier for a table must be represented as
the primary key of the table.
Sometimes, a table may contain an additional column (or columns) that also can be used to uniquely identify
a row in the table. For example, an Employee table may have the EmployeeID and SocialSecurityNumber columns.
The EmployeeID column serves as the unique row identifier, and it can be defined as the primary key . Similarly, the
SocialSecurityNumber column can be defined as the alternate key . In SQL Server, alternate keys can be defined using
unique constraints, which are essentially the younger siblings to primary keys. In fact, both the unique constraint and
the primary key constraint use unique indexes behind the scenes.
It's worth noting that there is honest disagreement regarding the use of a natural key (for example, the
SocialSecurityNumber column in the previous example) or an artificial key (for example, the EmployeeID column).
I've seen both designs succeed, but each approach has strengths and weaknesses. Rather than suggest one over the
other, I'll provide you with a couple of reasons to use both and some of the costs associated with each and thereby avoid
the religious argument. An identity column is usually an INT or a BIGINT , which makes it narrow and easy to index,
improving performance. Also, separating the value of the primary key from any business knowledge is considered
good design in some circles. One of the drawbacks of this approach is that the numbers sometimes acquire business
meaning, which should never happen. Another thing to keep in mind is that you have to create a unique constraint
for the alternate keys to prevent the creation of multiple rows where none should exist. This increases the amount of
information you have to store and maintain. Natural keys provide a clear, human-readable, primary key that has true
business meaning. They tend to be wider fields—sometimes very wide—making them less efficient inside indexes. Also,
sometimes the data may change, which has a profound trickle-down effect within your database and your enterprise.
Let me just reiterate that either approach can work well and that each provides plenty of opportunities for tuning.
Either approach, properly applied and maintained, will protect the integrity of your data.
Besides maintaining data integrity, unique indexes—the primary vehicle for entity-integrity constraints—help
the optimizer generate efficient execution plans. SQL Server can often search through a unique index faster than it can
search through a nonunique index. This is because each row in a unique index is unique; and, once a row is found,
SQL Server does not have to look any further for other matching rows (the optimizer is aware of this fact). If a column
is used in sort (or GROUP BY or DISTINCT ) operations, consider defining a unique constraint on the column (using a
unique index) because columns with a unique constraint generally sort faster than ones with no unique constraint.
To understand the performance benefit of entity-integrity or unique constraints, consider an example. Assume
you want to modify the existing unique index on the Production.Product table.
CREATE NONCLUSTERED INDEX [AK_Product_Name]
ON [Production].[Product] ([Name] ASC) WITH (
DROP_EXISTING = ON)
ON [PRIMARY];
GO
 
Search WWH ::




Custom Search