Information Technology Reference
In-Depth Information
delimiter can be considered “outside.” Sometimes data sets have missing
values, and if you represent missing values by putting in an asterisk or a
space or a special code, that is processing of sorts and the data is not
raw anymore. Ideally, every transformation should be traceable, preferably
both in a reverse chronological order and an upstream/downstream order.
This can be used for audit purposes or for improving data quality. Of
course, while recording the entire trail, do not forget to record the changes
to the metadata.
The concept of upstream and downstream gets merged with the
concept of raw data. The intuitive assumption is that upstream systems
have “rawer” data than downstream ones. Changes flow down. The point
to keep in mind is that this does not mean that the most upstream system
has the “rawest” data. The most upstream point may be downstream for
another system. Some databases attempt to maintain both raw and
enriched data in their schemas, allowing access to both. This is because
users may want to see what “originally came through.” The raw data in
such cases may only be raw in the sense that it was on the input side of
that system. Calling it raw does not mean that upstream systems have not
touched it. The enriched data is the “cleaned-up” version. For example,
one may wish to retain both the audio recording and the transcript of,
for example, a hearing. However, unless one can prove that it has not
been subsequently edited or transformed or manipulated, it cannot be
treated as raw data.
Sometimes, raw data may actually exist outside the formal databases.
The raw data for a payment transaction may actually be on a signed
check. If one is scanning the check and storing it, then it is raw compared
to the data entry transaction, but the physical check itself is the raw data
because there could be scanning or compression errors.
How can one bypass the filters to get to raw data? Designers must
allow for such possibility. One may want to send data to a printer more
directly than through a particular printer driver. One may want to look
at the image of the check rather than a listing of its posting in a database.
Getting raw data can be expensive. In any system when there is a
demand for raw data, the designer must question the request and verify
its necessity before running off to provide it. However, designers need to
incorporate into their systems some transparency about the degree of
rawness of the data for the benefit of the user. For example, NASA pictures
are identified as computer renditions. Poems that are translated are clearly
identified.
Search WWH ::




Custom Search