Database Design Using Normalization - Database Processing: Fundamentals, Design, and Implementation

Database Reference

In-Depth Information

This second query is easier to write and understand and will work regardless of the number of

children that an employee has.

The alternate design does require the DBMS to process two tables, and if the tables are

large and performance is a concern, one can argue that the original design is better. In such

cases, storing multivalues in multiple columns may be preferred. Another, less valid objection

to the two-table design is as follows: “We only need space for three cars because university

policy restricts each employee to registering no more than three cars.” The problem with this

statement is that databases often outlive policies. Next year that policy may change, and, if it

does, the database will need to be redesigned. As we discussed in Chapter 8, database redesign

is tricky, complex, and expensive. It is better to avoid the need for a database redesign.

By The WAy A few years ago, people argued that only three phone number columns

were needed per person: Home, Office, and Fax. Later they said, “Well,

OK, maybe we need four: Home, Office, Fax, and Mobile.” Today, who would want

to guess the maximum number of phone numbers a person might have? Rather than

guess, just store Phone in a separate table; such a design will allow each person to

have from none to an unlimited number of phone numbers.

You are likely to encounter the multivalue, multicolumn problem when creating data-

bases from nondatabase data. It is particularly common in spreadsheet and text data files.

Fortunately, the preferred two-table design is easy to create, and the SQL for moving the data

to the new design is easy to write.

By The WAy The multivalue, multicolumn problem is just another form of a multivalued

dependency. For the parking application, for example, rather than store

multiple rows in EMPLOYEE for each auto, multiple named columns are created in the

table. The underlying problem is the same, however.

Inconsistent Values

Inconsistent values are a serious problem when creating databases from existing data.

Inconsistencies occur because different users or different data sources may use slightly differ-

ent forms of the same data value. These slight differences may be hard to detect and will create

inconsistent and erroneous information.

One of the hardest such problems occurs when different users have coded the same en-

tries differently. One user may have coded a SKU_Description as Corn, Large Can ; another may

have coded the same item as Can, Corn, Large ; and another may have coded the entry as Large

Can Corn . Those three entries all refer to the same SKU, but they will be exceedingly difficult to

reconcile. These examples are not contrived; such problems frequently occur, especially when

combining data from different database, spreadsheet, and file sources.

A related, but simpler, problem occurs when entries are misspelled. One user may enter

Coffee , another may enter Coffeee . They will appear as two separate products.

Inconsistent data values are particularly problematic for primary and foreign key col-

umns. Relationships will be missing or wrong when foreign key data are coded inconsistently

or misspelled.

Two techniques can be used to find such problems. One is the same as the check for refer-

ential integrity shown on page 162. This check will find values for which there is no match and

will find misspellings and other inconsistencies.

Another technique is to use GROUP BY on the suspected column. For example, if we

suspect that there are inconsistent values on SKU_Description in the SKU_DATA table (and

note that here we are discussing and using the original SKU_DATA table with four columns as

Database Processing: Fundamentals, Design, and Implementation

Search WWH ::

Custom Search

Home