Transparency in Data Mining: From Theory to Practice - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

disclosure of the information within them (with the exception of the relevant data

subjects). The difficult questions pertain to the nature of transparency regarding

the technical measures for collating these datasets. These are probably best kept

out of the reach of the public eye. Auditing of this process would be carried out

internally, with the help of selected experts.

These conclusions follow from accounting for the elements discussed above.

With the exception of the actual datasets used, I doubt whether any disclosures

made regarding this segment will generate sufficient public interest to “shame”

lower level officials (who will be making most of the technical decisions) into

changing their practices. On the other hand, the “crowdsourcing” objective carries

merit at this juncture, especially regarding technical decisions. However, they

probably could be acheived by engaging in selective disclosure to experts.

The “autonomy” based arguments do not provide substantial insights. Those

premised upon the rights of data subjects (addressed in section 4.2.3) might justify

additional disclosure of these factors - especially regarding the personal

information used in this process. Yet I am skeptical whether this theory (which, as

mentioned suffers from several analytical flaws) can justify the disclosure of

ancillary information regarding the collation and matching process of the analysis.

On the other hand, it is quite a long shot to connect disclosure requirements at this

segment to the autonomy rights of those affected by the data mining analysis -

concerns arising on the opposite side of the information flow (and addressed in

section 4.2.4).

Segment (B) presents more of an analytic challenge. Currently, the public is left

almost entirely in the dark at this stage, in which the data is analyzed and patterns

formulated. This must change. Additional layers of disclosure should be applied to

both technological elements and human and policy decisions.

These arguments can be justified under several theories. Let us separately

approach the technology and policy aspects of this segment. In terms of the

technology used at this juncture, transparency will only serve as a minimal

“check” on governmental actions. The public would have limited interest in these

technical details. Thus, there is only limited potential for effective shaming.

“Crowdsourcing” is the argument which seems to have the greatest force. Both

autonomy based arguments are quite a stretch, as “data subjects” and “affected

individuals” will have a difficult case linking these rights to the actual computer

analysis. In view of the obvious detriments of sharing the technology with external

sources, a possible compromise calls for releasing the software to a selected group

of experts throughout the industry. These experts will be barred from sharing such

code commercially. They, however, would be able to inform the public if hidden

agendas are imbedded within the code. This seems to be a reasonable policy

strategy for disclosure of technology-related information at this juncture.

Moving to the realm of policy decisions, I find that information concerning the

decisions regarding the acceptable level of errors within the process (sometimes

referred to in the technical jargon as “support” and “confidence” (Zarsky 2002-3))

requires greater transparency. The internal balances between accuracy and security

will generate public interest that will prove to be an effective check on

government. These decisions also impact the personal autonomy of those affected

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home