Data Mining as Search: Theoretical Insights and Policy Responses - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

reasons (and others) I choose to focus on governmental data mining and leave a

discussion of the actions of the commercial entities for a later day.

In addition, at this point it is useful to point out what this chapter will not

discuss (even within the realm of governmental data mining) given the chapter's

focus on privacy. The analysis here presented will be premised on an underlying

assumption that the tools here discussed are effective in achieving their analytical

objectives while maintaining an acceptably low level of false positives and

negatives. Whether this is indeed true is currently hotly debated (Harper & Jonas,

2006; Schneier, 2006) and notoriously difficult to measure and prove. Those

opposing data mining can make a strong case that these predictive automated

processes are, in general, inherently flawed and ineffective. In addition, they

might argue they are particularly unfair to the individuals they implicate. This

position has merit, and is no doubt true in specific contexts. The critiques

presented below, however, will be premised upon the contrary assumption (which

I believe is true in a variety of other settings), that data mining is effective and

operational. Yet even so, such forms of analyses might prove problematic as they

clashes with other important interests. In addition, data mining generates concerns

related to the lack of transparency this practice entails, as well as discrimination it

could generate. These too are important aspects which are addressed elsewhere

within this volume (Chapter 17 and 19).

18.2 Governmental Data Mining: Definitions, Participants and

Problems

The term “data mining” has recently been used in several contexts by

policymakers and legal scholars. For this discussion, I revert to a somewhat

technical definition of this term of art. Here, data mining is defined as the

“ nontrivial process of identifying valid, novel, potentially useful and ultimately

understandable patterns in data ” (Fayyad et. al, 1996). Within this broader topic,

the core of this chapter focuses on data mining which enables “pattern based

searches” (also referred to as “event-based” data mining). These methods provide

for a greater level of automation and the discovery of unintended and previously

unknown information. Such methods can potentially generate great utility in the

novel scenarios law enforcement and intelligence now face - where a vast amount

of data is available, yet there is limited knowledge as to how it can be used and

what insights it can provide.

With “pattern based analyses,” the analysts engaging in data mining do not

predetermine the specific factors the analytical process will apply at the end of the

day. They do, however, define the broader datasets which will be part of the

analysis. Analysts also define general parameters for the patterns and results

which they are seeking and that could be accepted - such as their acceptable level

of error. Thereafter, the analysts let the software sift through the data and point

out trends within the relevant datasets, or ways in which the data could be

effectively sorted (Zarsky, 2002-2003). The data mining process could achieve

both descriptive and predictive tasks. In a predictive process (on which this

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home