Database Reference
In-Depth Information
This second query is easier to write and understand and will work regardless of the number of
children that an employee has.
The alternate design does require the DBMS to process two tables, and if the tables are
large and performance is a concern, one can argue that the original design is better. In such
cases, storing multivalues in multiple columns may be preferred. Another, less valid objection
to the two-table design is as follows: “We only need space for three cars because university
policy restricts each employee to registering no more than three cars.” The problem with this
statement is that databases often outlive policies. Next year that policy may change, and, if it
does, the database will need to be redesigned. As we discussed in Chapter 8, database redesign
is tricky, complex, and expensive. It is better to avoid the need for a database redesign.
By The WAy A few years ago, people argued that only three phone number columns
were needed per person: Home, Office, and Fax. Later they said, “Well,
OK, maybe we need four: Home, Office, Fax, and Mobile.” Today, who would want
to guess the maximum number of phone numbers a person might have? Rather than
guess, just store Phone in a separate table; such a design will allow each person to
have from none to an unlimited number of phone numbers.
You are likely to encounter the multivalue, multicolumn problem when creating data-
bases from nondatabase data. It is particularly common in spreadsheet and text data files.
Fortunately, the preferred two-table design is easy to create, and the SQL for moving the data
to the new design is easy to write.
By The WAy The multivalue, multicolumn problem is just another form of a multivalued
dependency. For the parking application, for example, rather than store
multiple rows in EMPLOYEE for each auto, multiple named columns are created in the
table. The underlying problem is the same, however.
Inconsistent Values
Inconsistent values are a serious problem when creating databases from existing data.
Inconsistencies occur because different users or different data sources may use slightly differ-
ent forms of the same data value. These slight differences may be hard to detect and will create
inconsistent and erroneous information.
One of the hardest such problems occurs when different users have coded the same en-
tries differently. One user may have coded a SKU_Description as Corn, Large Can ; another may
have coded the same item as Can, Corn, Large ; and another may have coded the entry as Large
Can Corn . Those three entries all refer to the same SKU, but they will be exceedingly difficult to
reconcile. These examples are not contrived; such problems frequently occur, especially when
combining data from different database, spreadsheet, and file sources.
A related, but simpler, problem occurs when entries are misspelled. One user may enter
Coffee , another may enter Coffeee . They will appear as two separate products.
Inconsistent data values are particularly problematic for primary and foreign key col-
umns. Relationships will be missing or wrong when foreign key data are coded inconsistently
or misspelled.
Two techniques can be used to find such problems. One is the same as the check for refer-
ential integrity shown on page 162. This check will find values for which there is no match and
will find misspellings and other inconsistencies.
Another technique is to use GROUP BY on the suspected column. For example, if we
suspect that there are inconsistent values on SKU_Description in the SKU_DATA table (and
note that here we are discussing and using the original SKU_DATA table with four columns as
 
Search WWH ::




Custom Search