Combining and Analyzing Judicial Databases - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

case is taken as the classification criterion. As a result, less severe offences 'dis-

appear' in the data reported by the court.

It is important that existing semantic dependencies between attributes (if any)

are preserved while integrating data. Therefore, in a data warehouse domain

experts need to keep track of semantic dependencies. In a dataspace these may be

captured in rules.

Resolving inconsistencies

The different judicial databases have overlapping or redundant attributes. Redun-

dancy may introduce inconsistencies that have to be detected and solved manually

based on domain expertise. Take for example the nationality of a suspect that is

recorded by different organizations. It is known that, in practice, foreigners tend to

provide a wrong nationality when they are not able to show identification papers.

As a result, inconsistencies may arise between different databases of different or-

ganizations. This can be resolved by utilizing the domain knowledge.

Prior to loading data into a data warehouse, inconsistencies have to be indenti-

fied and resolved. This means that all values of overlapping or redundant

attributes have to be in agreement with each other. In a dataspace approach incon-

sistencies can be detected automatically and on the fly using rules that check

attributes coming from different sources.

Handling semantic changes

Data evolve over time as rules and regulations are changing. Therefore, certain

values on certain attributes may have gotten a different meaning over time. For

instance, due to municipal reorganizations in the Netherlands, names of munici-

palities and cities have changed, while the old registered names were not always

updated. Over time, the meaning of the old names may become unknown. Moreo-

ver, in case cities are expanded, their names mean something different before the

reorganization than after. If these changes are not recorded, data may be combined

improperly or wrong conclusions may be drawn based on them. To keep track of

the 'history' of the attributes, semantic changes have to be recorded. In a datas-

pace this can be done in the relationship manager.

Concluding example

In general, a dataspace approach may be considered to be more efficient and prac-

tical than a data warehouse approach, because in the former it is easier to combine

data and add new sources, as there is no need for data reconciliation. Additionally,

using a dataspace approach dependencies, inconsistencies, and changes can be

managed more effectively.

As an illustration, assume that one wants to know how many of the suspects

questioned by the police are handed over to the prosecution and how many of

them are actually prosecuted. To answer this question, the databases of the police

(HKS) and the prosecution (OM-data) have to be integrated. However, OM-data

only contains data of cases that are handled by the prosecution. This means that

not all individuals in HKS are present in OM-data and, therefore, combining on an

individual level, which is needed in a data warehouse approach, is impossible for

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home