Database Reference
In-Depth Information
of denying products and services in particular neighborhoods, marked with a red
line on a map to delineate where not to invest. This resulted in discrimination
against black inner city neighborhoods. For instance, when people living in a
particular zip code area have a high health risk, insurance companies may use the
zip code (trivial information) as an indication of a person's health (sensitive
information), and may thus use the trivial information as a selection criterion.
Note that refusing insurance on the basis of a zip code may be acceptable, as
companies may choose (on the basis of market freedom) the geographic areas in
which they operate. On the other hand, refusing insurance on the basis of sensitive
data may be prohibited on the basis of anti-discrimination law. Masking may
reduce transparency for a data subject, as he or she may not know the
consequences of filling in trivial information, such as a zip code. In databases
redlining may occur not necessarily by geographical profiling, but also by
profiling other characteristics
Step 5: Acting upon Discovered Knowledge
Step 5 consists of determining corresponding actions. Such actions are, for
instance, the selection of people with particular characteristics or the prediction of
people's health risks. Several practical applications are discussed in Part III of this
topic. During the entire knowledge discovery process, it is possible -and
sometimes necessary- to feedback information obtained in a particular step to
earlier steps. Thus, the process can be discontinued and started over again when
the information obtained does not answer the questions that need to be answered.
1.2.2 From Data to Knowledge
The KDD-process may be very helpful in finding pattern and relations in large
databases that are not immediately visible to the human eye. Generally, deriving
patterns and relations are considered creating added value out of databases, as the
patterns and relations provide insight and overview and may be used for decision-
making. The plain database may not (or at least not immediately) provide such
insight. For that reason, usually a distinction is made between the terms data and
knowledge. Data is a set of facts, the raw material in databases usable for data
mining, whereas knowledge is a pattern that is interesting and certain enough for a
user. 14 It may be obvious that knowledge is therefore a subjective term, as it
depends on the user. For instance, a relation between vegetable consumption and
health may be interesting to an insurance company, whereas it may not be
interesting to an employment agency. Since a pattern in data must fulfill two
conditions ( interestingness and certainty) in order to become knowledge, we will
discuss these conditions in more detail.
According to Frawley et al. (1991), interestingness requires three things: novelty,
usefulness and non-triviality. Whether a pattern is novel depends on the user's
14 Frawley, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1993).
Search WWH ::

Custom Search