Denormalized or overnormalized models
Relational database design involves a process of eliminating redundancy. Elimina-
tion of redundancy is important to ensure that a database provides high perfor-
mance and is flexible and maintainable. The process of eliminating redundancy
in a data model is called normalization , and certain levels of normalization can be
achieved. Raw data in tabular form generally will contain a great deal of redun-
dancy and is therefore considered denormalized. Normalization is a complex
topic that we won't discuss in great detail here.
When a database is first being designed, the raw data is analyzed for redun-
dancy. A database administrator, a data modeler, or even a developer will take the
raw data and normalize it using a collection of specific rules that are intended to
eliminate redundancy. A denormalized relational model will contain redundant
data in a few tables, each with a lot of rows and columns. A normalized model will
have minimal or no redundancy and will have a greater number of tables, but
each table will have fewer rows and columns.
There is no perfect level of normalization. Denormalization does have advan-
tages in terms of simplicity and sometimes performance as well. A denormalized
model can allow data to be stored and retrieved more quickly than if the data were
normalized. This is true simply because there are fewer statements to issue, fewer
joins to calculate, and generally less overhead. That said, denormalization should
always be the exception and not the rule. A good approach to database design is to
begin with a “by the topic” normalized model. Then the model can be denormal-
ized as needed. It is much easier to denormalize the database after the fact than it
is to renormalize it. So always start new database designs with a normalized model.
It is possible to overnormalize a database, and the results can be problematic.
Too many tables create a lot of relationships that need to be managed. This can
include a lot of table joins when querying data, and it means multiple update
statements are required to update data that is very closely related. Both of these
characteristics can have a negative impact on performance. It also means that it's
harder to map to an object model, as you may not want to have such fine-grained
classes as the data model does.
Denormalized models are problematic too, possibly more so than overnormal-
ized models. Denormalized models tend to have more rows and columns. Having
too many rows impacts performance negatively in that there is simply more data to
search through. Having too many columns is similar in that each row is bigger and
therefore requires more resources to work with each time an update or a query is
performed. Care must be taken with these wide tables to ensure that only columns