Data Dilemmas in the Information Society: Introduction and Overview - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

of denying products and services in particular neighborhoods, marked with a red

line on a map to delineate where not to invest. This resulted in discrimination

against black inner city neighborhoods. For instance, when people living in a

particular zip code area have a high health risk, insurance companies may use the

zip code (trivial information) as an indication of a person's health (sensitive

information), and may thus use the trivial information as a selection criterion.

Note that refusing insurance on the basis of a zip code may be acceptable, as

companies may choose (on the basis of market freedom) the geographic areas in

which they operate. On the other hand, refusing insurance on the basis of sensitive

data may be prohibited on the basis of anti-discrimination law. Masking may

reduce transparency for a data subject, as he or she may not know the

consequences of filling in trivial information, such as a zip code. In databases

redlining may occur not necessarily by geographical profiling, but also by

profiling other characteristics

Step 5: Acting upon Discovered Knowledge

Step 5 consists of determining corresponding actions. Such actions are, for

instance, the selection of people with particular characteristics or the prediction of

people's health risks. Several practical applications are discussed in Part III of this

topic. During the entire knowledge discovery process, it is possible -and

sometimes necessary- to feedback information obtained in a particular step to

earlier steps. Thus, the process can be discontinued and started over again when

the information obtained does not answer the questions that need to be answered.

1.2.2 From Data to Knowledge

The KDD-process may be very helpful in finding pattern and relations in large

databases that are not immediately visible to the human eye. Generally, deriving

patterns and relations are considered creating added value out of databases, as the

patterns and relations provide insight and overview and may be used for decision-

making. The plain database may not (or at least not immediately) provide such

insight. For that reason, usually a distinction is made between the terms data and

knowledge. Data is a set of facts, the raw material in databases usable for data

mining, whereas knowledge is a pattern that is interesting and certain enough for a

user. 14 It may be obvious that knowledge is therefore a subjective term, as it

depends on the user. For instance, a relation between vegetable consumption and

health may be interesting to an insurance company, whereas it may not be

interesting to an employment agency. Since a pattern in data must fulfill two

conditions ( interestingness and certainty) in order to become knowledge, we will

discuss these conditions in more detail.

Interestingness

According to Frawley et al. (1991), interestingness requires three things: novelty,

usefulness and non-triviality. Whether a pattern is novel depends on the user's

14 Frawley, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1993).

Search WWH ::

Custom Search

Home