Database Reference
In-Depth Information
data on an individual level involves data reconciliation, that is, the identification
of data in different sources that refer to the same entity. In Subsection 10.3.1 it is
shown how this can be established using a data warehouse approach.
Alternatively, policymakers need combined data at an aggregate level. They
want to gain insight into the criminal law system as a whole, for instance, to an-
swer the question of which kinds of suspects are brought to court and which kinds
of cases are settled out of court. Such insight may be relevant to them in order to
be able to define an effective policy. To provide them with this information, the
different databases also have to be combined, but not on an individual level, in this
case a higher level view is more useful as will be shown in Subsection 10.3.2. In
this subsection, a dataspace approach will be presented in which aggregate data
are related.
10.3.1 A Data Warehouse Approach to Combining Judicial Data
A data warehouse is a central repository of data collected from different sources. 4
These data are stored and structured in such a way that querying and reporting are
facilitated. It provides a uniform data model for all data regardless of their source.
Generally, a data warehouse consists of three layers that provide storage of the
original data sources, integration, and access (see Figure 10.2). First, the raw data
from different databases are extracted. Subsequently, these data are cleaned, trans-
formed, and loaded into the data warehouse. The data warehouse then contains
data from different databases that are combined and ordered. In addition, informa-
tion about the data in the data warehouse is stored in a metadatabase. This data-
base contains information about the sources and history of the data. Finally, as a
last step, data from the data warehouse are provided to end-users through data
marts. The key step in developing a data warehouse is data integration; therefore,
data reconciliation is of crucial importance. 5
The main problem with combining and integrating crime data is that only a few
organizations with an operational task are allowed (by law) to combine data based
on unique identifiers or a set of privacy-sensitive attributes. For this reason, before
making crime data available for research purposes, privacy-sensitive attributes are
stripped from the databases. Hence, for data reconciliation other overlapping in-
formation in the to-be-combined databases has to be exploited. This can either be
information about the database schemata or information that is extracted from the
database content. Furthermore, in order to be able to utilize this information, do-
main knowledge from experts is needed.
In practice, to establish whether two records from different database system de-
note the same object, the following general rule of thumb can be applied: 6 the
larger the number of common attributes with the same values for two records from
two different systems, the higher the chance that the records relate to the same
4 Kimball, R. & Ross, M. (2002).
5 Choenni, S., van Dijk, J. & Leeuw, F. (2010).
6 Choenni, S., van Dijk, J. & Leeuw, F. (2010), Choenni, S. & Meijer, R. (2011).
Search WWH ::




Custom Search