Transparency in Data Mining: From Theory to Practice - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

forced to merely state that the individual was singled out based on the algorithm,

which was structured on the basis of previous findings. 7

A policy decision mandating interpretable results calls upon analysts to work

through the statistical outputs received, understand their meaning and articulate

them clearly. In doing so, analysts note the correlations between higher risks and

personal factors (such as height, age, specific credit or purchasing history). With

this information, the analyst sets up profiles based on these findings, while

defining their parameters, and applies them to future events. When seeking

correlations, analysts might choose to ignore findings which seem ridiculous or

cannot be explained by an intuitive causation model. Thus, interpretability could

be considered as an important step to assure quality and precision, and that the

results are not merely anecdotal. The analyst could also provide a response to

external inquiries as to what initiated special treatment of an event or individual.

The flip side is that interpretability calls for models which are less complex and

therefore less accurate (Martens & Provost, 2011).

Interpretability also allows the analyst to go beyond correlation and search for a

theory that could uncover causation. For instance, one way, cash-only airline

tickets could (in theory) be casually linked to terrorists planning to ignite

explosives on an aircraft. Constructing a theory of causation linking these two

dynamics is relatively simple (although not necessarily true). Other correlations

might call for more elaborate theories of causations. Validating such theories will

call for additional study both of fact patterns and possibly in the field - all in an

attempt to reveal the forms of causation in play. Therefore, requiring a theory of

causation to be set in place prior to taking action based on correlations would

further assure the precision of the process. On the other hand, requiring causation

theories might potentially slow down and encumber the efficiency of the entire

process (and might even be an impossible task). In summary, policy decisions

mandating interpretability and causation are subtle, but will have a substantial

impact on the prospect of transparency throughout the process.

17.3 The Nature of Transparency in Predictive Modeling:

Working through the Information Flow

A call for transparency evolves when considering predictive data mining and its

outcomes. Yet transparency can refer to a variety of segments throughout the

prediction modeling process. Assuring transparency at every segment generates

specific forms of costs and balances, and is derived from a different set of laws

and justifications. In some instances, transparency might merely require uploading

7 This is mostly the case when more advanced tools of data mining are applied, such as

decision tree learning. Since these tools generate specific concerns of their own, they will

not be further addressed here. For a discussion of such instances that at times involved

tens of thousands of factors, see David Martens & Foster Provost, Explaining Documents'

Classification , NYU - Stern School of Business, Working Paper CeDER-11-01,

http://archive.nyu.edu/handle/2451/29918.

Search WWH ::

Custom Search

Home