Transparency in Data Mining: From Theory to Practice - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

17.1 Introduction: Transparency, Technology and Prediction

Can human behavior be predicted? A broad variety of governmental initiatives are

using computerized processes to try. Recent advances in mathematics, artificial

intelligence and computer science might render this futuristic scenario possible.

Vast datasets of personal information, available to commercial and governmental

entities, enhance the ability to engage in these ventures, and the appetite to push

them forward.

Governments have a distinct interest in automated individualized predictions to

foresee unlawful actions. This is especially true when such behavior generates

substantial risks or is difficult to enforce. Data mining applications are the

technological tools which make governmental prediction possible. They are

essential to overcome the vast amounts of personal information at the

government's disposal, and the need to analyze the information in real time. These

computer programs automatically work through vast datasets to uncover trends in

personal data. They then apply the novel trends and patterns revealed to other

individuals and actions, while sorting the latter accordingly. In doing so, they try

to figure out what the individuals' next steps would be - who of us has a higher

chance of being a tax evader, criminal, or even terrorist.

The growing use of predictive practices premised upon the analysis of personal

information and powered by data mining, has generated a flurry of negative

reactions and responses. An overall concern is the lack of transparency these

processes entail. A call for transparency emerges from the public, press and even

from the US legislator. 1 A need for transparency is commonly cited when calling

for changes in these initiatives (TAPAC Report, 2004; Cate, 2008; Solove, 2008).

Although echoed across the policy, legal and academic debate, the nature of

transparency in this context is unclear and calls for a rigorous analysis.

Transparency might pertain to different segments of the data mining and

prediction process. In addition, it flows from different, even competing, rationales,

as well as a variety of legal and philosophical backgrounds. When viewed in

concert, they lead to different, at times contradicting, conclusions and practical

recommendations. This chapter makes initial steps in illuminating the true

meaning of transparency in this specific context and provides tools for further

examining this issue.

This chapter begins by briefly describing and explaining the practices of data

mining, when used to predict future human conduct on the basis of previously

collected personal information (Part 2). Part 3 moves to address the flow of

information generated in the prediction process. In doing so, it introduces a helpful

taxonomy regarding four distinct segments within the prediction process. Each

segment presents unique transparency-related challenges. This part also provides

for initial strategies as to how transparency could be achieved at every juncture.

Part 4 commences a brief theoretical analysis seeking the foundations for

transparency requirements in this context. The analysis addresses transparency as

a tool to enhance government efficiency, facilitate crowdsourcing and promote

1 Federal Agency Data Mining Reporting Act 42 U.S.C. § 2000ee-3(c)(2).

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home