Database Reference
In-Depth Information
The best solution in this case is to identify all of the different purposes of the remarks
column, create new columns for each of those purposes, and then extract the data and store it
into the new columns as appropriate. However, this solution can seldom be automated.
In practice, all solutions require patience and hours of labor. Learn to be wary of such col-
umns, and don't take such jobs on a fixed-price basis!
Summary
When constructing a database from existing data, the first
step is to assess the structure and content of the input tables.
Count the number of rows and use the SQL SELECT TOP
{NumberOfRows} * phrase to learn the columns in the data.
Then examine the data and determine functional dependen-
cies, multivalued dependencies, candidate keys, each table's
primary key, and foreign keys. Check out the validity of pos-
sible referential integrity constraints.
Design principles differ depending on whether an up-
datable or read-only database is being constructed. If the for-
mer, then modification anomalies and inconsistent data are
concerns. The advantages of normalization are elimination
of modification anomalies, reduced data duplication, and
the elimination of data inconsistencies. The disadvantages
are that more complex SQL will be required and application
performance may be slower.
For updatable databases, most of the time the problems
of modification anomalies are so great that all tables should
be placed in BCNF. SQL for normalization is easy to write.
In some cases, if the data will be updated infrequently and if
inconsistencies are readily corrected by business processes,
then BCNF may be too pure and the tables should not be
normalized. The problems of multivalued dependencies are
so great that they should always be removed.
Read-only databases are created for reporting, query-
ing, and data mining applications. Creating such a database
is a task commonly assigned to beginners. When designing
read-only databases, normalization is less desired. If input
data is normalized, it frequently needs to be denormalized
by joining it together and storing the joined result. Also,
sometimes many copies of the same data are stored in tables
customized for particular applications.
Four common problems occur when creating databases
from existing data. The multivalue, multicolumn design sets
a fixed number of repeating values and stores each in a col-
umn of its own. Such a design limits the number of items
allowed and results in awkward SQL query statements. A
better design results from putting multiple values in a table
of their own.
Inconsistent values result when data arise from different
users and applications. Inconsistent foreign key values create
incorrect relationships. Data inconsistencies can be detected
using SQL statements, as illustrated in this chapter. A null
value is not the same as a blank. A null value is not known
to be anything. Null values are a problem because they are
ambiguous. They can mean that a value is inappropriate, un-
known, or known but not yet entered into the database.
The general-purpose remarks column is a column that
is used for different purposes. It collects data items in an in-
consistent and verbose manner. Such columns are especially
problematic if they contain data needed for a foreign key.
Even if they do not, they often contain data for several differ-
ent columns. Automated solutions are not possible, and the
correction requires patience and labor.
Key Terms
denormalize
empty set
null value
SQL COUNT(*) function
SQL DROP TABLE statement
SQL INSERT statement
SQL SELECT * statement
SQL TOP {NumberOfRows} property
 
 
 
 
Search WWH ::




Custom Search