Combining and Analyzing Judicial Databases - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

sense that a common data model is not required and that there is no need to link

data based on unique identifiers. As a result, in a dataspace approach not only

microdata but also aggregate data may be used. This does not alter the fact that a

dataspace layer may contain a data warehouse as a data source.

The worked-out examples from the Dutch criminal justice chain illustrate that

data integration can be executed in a variety of ways. For instance, depending on

the needs of the users or the availability of the data, parts in this process may have

to be altered. In the next section it is shown how potential problems associated

with linking (crime) data affect the data integration process and the choices made

in it.

10.4 Challenges in Combining Judicial Data

The main problem with data integration in the field of justice is that, although it

can be automated for a large part, a significant amount of manual effort is still re-

quired. The main reason for this is the nature of crime data: redundancy, inconsis-

tencies, dependencies, and semantic changes are not uncommon. In the remainder

of this section, these potential problems and their consequences for the data inte-

gration process are described in detail.

Taking care of quantitative and qualitative dependencies

One of the problems with reconciling judicial data is the fact that quantitative de-

pendencies between different data sources exist. For example, the date on which a

crime is reported is usually the same as the date on which the crime is committed

or the output of the police is usually greater than the input into the prosecutorial

level. Though some of this knowledge may be exploited for data reconciliation (to

compare records from different sources), it requires manual effort and the partici-

pation of domain experts.

Qualitative dependencies also exist within databases. For instance, it is general-

ly assumed that the value of a certain attribute does not change dramatically in a

few years. Therefore, it is recommended to compare the value of an attribute in a

certain year to its value in preceding years in order to detect large deviations.

Thus, when data from different sources are combined, both quantitative and qu-

alitative dependencies have to be managed in order to avoid unreliable data. In a

data warehouse this has to be done manually by domain experts. In a dataspace

approach it can be automated fully using dynamic rules that check the reliability

of the data and detect deviations.

Managing semantic dependencies

Besides quantitative and qualitative dependencies, also semantic dependencies ex-

ist in and between judicial databases. These arise because different organizations

in the criminal law chain store data about the same events, but often label or clas-

sify these data differently. For example, in case of a robbery a victim may classify

it as a violent crime, while the police may classify it as a crime against property.

Additionally, for a single case in court that contains several offences, the severest

Search WWH ::

Custom Search

Home