Database Reference
In-Depth Information
data is contextuality determined in time and location, the so called spatio-temporal
context. The phrase 'It is cold here' might signify different things in different con-
texts. If it is uttered after a long trip through the dessert, it might signify a positive
feeling, while if it is uttered in a room with an open window, it might signify
'Could you please close the window'. Likewise, the time at which a phrase is ut-
tered is significant. 52 Furthermore, the context may change over time. The phrase
'A bald man living on Abbey Road 4 in London', may originally signify only per-
son A, but over some time could relate to both person A and B, to person B only
or to no one at all. Reference can also be made to so called contextual and conver-
sational implicatures. Suppose just after a job interview, the employee would con-
tact one of the persons on the list of references and were to ask that person
whether the applicant would be fit for an university job as researcher and the an-
swer would be 'Well, I can tell you for sure that he makes good coffee'. Since the
presumption is that a speaker will provide the maximum relevant information and
this information is not relevant at all in this specific context, this would presum-
able mean 'no'. 53 (Again, this changes if uttered when applying for a job in the
canteen). Contextuality is essential to understanding and interpreting data and in-
formation.
In a way, data mining, profiling and knowledge discovery in data bases give
rise to a form of collective autism. Knowledge discovery in databases has the ten-
dency to disregard the contextuality of information. Data are sometimes incorrect,
incomplete and out of date, the data set may be tilted towards a certain group of
people due to the research methodology, the data may be analyzed and used in a
different context and for a different purpose then was originally intended and it's
not uncommon that the context in which rules and profiles are put to work in prac-
tice are disregarded.
Knowledge discovery in databases may conflict with legal provisions regarding
discrimination and privacy. A currently widely propagated solution is that of data
minimization, which entails a restriction on the amount of sensitive data gathered,
analyzed in the data mining process and used in practical decisions based on the
data mining results. The tendency in knowledge discovery in data bases to disre-
gard the context of the data is only aggravated by the data minimization principle.
The loss of contextuality leads to loss of value of the database and the outcome
of the data mining process. Moreover, this chapter has argued, the loss of contex-
tuality may give rise to or aggravate already existing privacy and discrimination
problems. Thus, sometimes, the data minimization principle may have a counter-
productive effect.
Therefore, rather than minimizing the amount of data, this chapter has argued for
a minimum amount of data. This replaces the data minimization principle with the
data mini mum mization principle. The latter principle requires a minimum set of da-
ta being gathered, stored and clustered when used in practice. First, with regard to
the gathering of data, the methodology with, the context in and the reasons for
which the data were gathered should be included. With regard to storing data, the
data must be correct, accurate and kept up to date; the decisions on categorization
52 Grice (1975).
53 This example refers to the maxim of relevance.
Search WWH ::




Custom Search