Database Reference
In-Depth Information
reasons (and others) I choose to focus on governmental data mining and leave a
discussion of the actions of the commercial entities for a later day.
In addition, at this point it is useful to point out what this chapter will not
discuss (even within the realm of governmental data mining) given the chapter's
focus on privacy. The analysis here presented will be premised on an underlying
assumption that the tools here discussed are effective in achieving their analytical
objectives while maintaining an acceptably low level of false positives and
negatives. Whether this is indeed true is currently hotly debated (Harper & Jonas,
2006; Schneier, 2006) and notoriously difficult to measure and prove. Those
opposing data mining can make a strong case that these predictive automated
processes are, in general, inherently flawed and ineffective. In addition, they
might argue they are particularly unfair to the individuals they implicate. This
position has merit, and is no doubt true in specific contexts. The critiques
presented below, however, will be premised upon the contrary assumption (which
I believe is true in a variety of other settings), that data mining is effective and
operational. Yet even so, such forms of analyses might prove problematic as they
clashes with other important interests. In addition, data mining generates concerns
related to the lack of transparency this practice entails, as well as discrimination it
could generate. These too are important aspects which are addressed elsewhere
within this volume (Chapter 17 and 19).
18.2 Governmental Data Mining: Definitions, Participants and
Problems
The term “data mining” has recently been used in several contexts by
policymakers and legal scholars. For this discussion, I revert to a somewhat
technical definition of this term of art. Here, data mining is defined as the
nontrivial process of identifying valid, novel, potentially useful and ultimately
understandable patterns in data ” (Fayyad et. al, 1996). Within this broader topic,
the core of this chapter focuses on data mining which enables “pattern based
searches” (also referred to as “event-based” data mining). These methods provide
for a greater level of automation and the discovery of unintended and previously
unknown information. Such methods can potentially generate great utility in the
novel scenarios law enforcement and intelligence now face - where a vast amount
of data is available, yet there is limited knowledge as to how it can be used and
what insights it can provide.
With “pattern based analyses,” the analysts engaging in data mining do not
predetermine the specific factors the analytical process will apply at the end of the
day. They do, however, define the broader datasets which will be part of the
analysis. Analysts also define general parameters for the patterns and results
which they are seeking and that could be accepted - such as their acceptable level
of error. Thereafter, the analysts let the software sift through the data and point
out trends within the relevant datasets, or ways in which the data could be
effectively sorted (Zarsky, 2002-2003). The data mining process could achieve
both descriptive and predictive tasks. In a predictive process (on which this
Search WWH ::




Custom Search