Databases Reference
In-Depth Information
Conducting a Preliminary Investigation of Data Quality
Without some understanding of data structures for the current applica-
tion, it is not possible to look at the quality of the data. To examine the qual-
ity of the data, the DCT can run existing reports, do online queries and, if
possible, quickly write some fourth-generation language programs to ex-
amine issues such as referential, primary key, and domain integrity viola-
tions that the users might never notice. When the investigation is done, the
findings can be documented formally.
Analyzing the Old Logical Data Model
When the physical structure of the data is understood, it can be repre-
sented in its normalized logical structure. This step, although seemingly
unnecessary, allows the DCT to specify the mapping in a much more reli-
able fashion. The results should be documented with the aid of an entity-
relationship diagram accompanied by dictionary descriptions.
Analyzing the New Physical Data Model
The new logical model should be transformed into a physical represen-
tation. If a relational database is being used, this may be a simple step.
Once this model is done, the mapping can be specified.
Determining the Data Mapping
This step is often more difficult than it might seem initially. Usually, the
exceptions are one old file-to-one new file, and one old field-to-one new
field.
Often there are cases where the old domain must be transformed into a
new one; an old field is split into two new ones; two old fields become one
new one; or multiple records are looked at to derive a new one. There are
many ways of reworking the data, and an unlimited number of special cases
may exist. Not only are the possibilities for mapping numerous and com-
plex, in some cases it is not possible at all to map to the new model because
key information was not collected in the old system.
Determining How to Treat Missing Information
It is common when doing conversion to discover that some of the data
to populate the new application is not available, and there is no provision
for it in the old database. It may be available elsewhere as manual records,
or it may never have been recorded at all.
Sometimes, this is only an inconvenience — dummy values can be put in
certain fields to indicate that the value is not known. In the more serious
case, the missing information would be required to create a primary key or
a foreign key. This can occur when the new model is significantly different
Search WWH ::




Custom Search