Database Reference
In-Depth Information
To generate a representative workload, representative test data is needed. Not only should
the number of rows and the size of the rows match the expected quantity, but also the data
distribution and the content should match real data. For example, if an attribute should
contain the name of a city, it is much better to use real city names than to use character
strings like Aaaacccc or Abcdefghij . This is important because in both the application and
the database there are certainly many situations where different data could lead to different
behavior (for example, with indexes or when a hash function is applied to data).
The test infrastructure should be as close as possible to, and ideally the same as, the
production infrastructure. This is especially difficult for both highly distributed systems and
systems that cooperate with a large number of other systems.
In a sequential life cycle model, the integration and acceptance testing phase occurs close to the end of the
project, which might be a problem if a major flaw in the architecture leading to performance problems is detected
too late. To avoid such a problem, stress tests should be performed during the coding and unit testing phases as well.
Note that an iterative life cycle model does not have this problem. In fact, by the very definition of “iterative life cycle
model,” a stress test should be performed for every iteration.
Designing for Performance
Given that applications should be designed for performance, it would be useful to cover an approach to doing that in
great detail. However, the focus of this topic is on troubleshooting. For this reason, I limit myself to briefly describing
the top ten most common database-related design problems that frequently lead to suboptimal performance.
Lack of Logical Database Design
Once upon a time, it was considered obvious that one should have a data architect involved in every development
project. Often this person was not only responsible for the data and the database design, but was also part of the team
in charge of the whole architecture and design of the application. Such a person often had extensive experience with
databases. He knew exactly how to design them to guarantee data integrity as well as performance.
Today, unfortunately, it is not always so. Too often I see projects in which no formal database design is done. The
application developers do the client and/or middle-tier design. Then, suddenly, the database design is generated by a
tool such as a persistence framework. In such projects, the database is seen as a dumb device that stores data. Such a
viewpoint of the database is a mistake.
Implementing Generic Tables
Every CIO dreams of applications that are easily able to cope with new or changed requirements. The keyword is
flexibility . Such dreams sometimes materialize in the form of applications that use generic database designs. Adding
new data is just a matter of changing the configuration without changing the database objects themselves.
Two main database designs are used to achieve such flexibility:
Entity-attribute-value (EAV) models: As their name implies, to describe every piece of
information, at least three columns are used: entity, attribute, and value. Each combination
defines the value of a specific attribute associated to a specific entity.
XML-based designs: Each table has just a few columns. Two columns are always present:
an identifier and an XML column to store almost everything else. Sometimes a few other
columns are also available for storing metadata information (for example, who did the last
modification and when).
 
Search WWH ::




Custom Search