Chemistry Reference
In-Depth Information
A.3 Normalizing Data
When a data value is repeated multiple times in a column in a database, it
is said to violate third normal form. For example, a table of values for logp
might contain a column named ref having literature references. The value
'Hansch, et. al. (1995)' might be repeated many times. It is easy to spot this,
and easy to correct it as well. The following SQL can be used to help put a
table of logp values and references into third normal form.
-- New table to hold all references identified by unique id
Create Table literature_refs (refid Serial, reference Text);
-- Populate table with unique references found in logp table
Insert Into literature_refs (reference)
Select Distinct ref From logp Group By ref;
-- Create column in logp to hold reference id instead of reference
Alter Table logp Add Column refid integer;
-- Populate logp table's reference id column with appropriate values
Update logp Set refid =
(Select refid From literature_refs Where ref=reference);
-- No need for reference column anymore
Alter Table logp Drop Column ref;
This will create a table literature_refs that will hold all the unique values
of references that exist in the table logp . The comments in the above code
should explain the steps in this process. Once the literature _ refs
table is complete, a full reference can be obtained during a search of logp
using SQL like this.
Select cas, reference From logp Join literature_refs Using (refid);
A brief excerpt from a literature _ refs table is shown in Table A.1. It
was constructed using this technique and nicely illustrates an advantage
of normalizing a table in this way.
Table A.1 Sample Rows from Reference
Table Showing How Spelling Anomolies
Can Be Easily Identified
Refid
Reference
2
ABRAHAM MH ET AL. (1994)
3
ABRAHAM,MH ET AL. (1994)
84
CHEM INSPECT TESTING INST (1992)
85
CHEM INSPECT TEST INST (1992)
86
CHEM INSPEXT TEST INST (1992)
131
EL TAYAR,N ET AL. (1985)
132
EL TAYAR,N ET AL. (1991)
133
EL TAYER,N ET AL. (1985)
Search WWH ::




Custom Search