Database Reference
In-Depth Information
techniques to such data. In this way, profiles of criminals or offenders may be
constructed. In the next section, potentials and challenges of analyzing combined
crime data will be described.
10.6 Risks of Analyzing Judicial Data
Statistics may be considered as a standard tool for the analysis of police and jus-
tice data. However, as in many organizations, the amount of data collected and
stored by the judicial organizations has grown exponentially. In many fields, es-
pecially technically oriented fields, data mining has been proven to have an add-
ed value over statistics in analyzing large amounts of data. 13 See Choenni et al.
(2005) for a summary of the differences between statistics and data mining. Data
mining is the process of searching for statistical relations, or patterns, in large da-
ta sets. It is often used to gain a different perspective on the data and to extract
useful information from them. Commonly used methods include rule learning
(searching for relationships in the data), clustering (discovering groups in the
data that are similar), and classification (generalizing known structures to new
data). Thus, data mining is able to reveal useful knowledge that is hidden in a
large amount of data. Therefore, there is a growing interest in applying data min-
ing techniques to crime data.
However, the straightforward application of statistical techniques, and data
mining in particular, may be risky. As has been pointed out in the literature (Hand,
1998), data mining results need to be evaluated by experts to determine whether
they hold in the real world. The main reason for this is that data mining is based
on induction and, therefore, the results may be true given the data, but not in the
real world. For example, assume that all swans in a given databases are white,
then it may be induced from the database that all swans are white. However, it is
very well possible that only features of white swans are stored in the databases and
that the very small group of black swans is neglected. As a result, the induced
knowledge with regard to swans does not hold in the real world. Therefore, it is of
vital importance to evaluate the truthfulness of data mining results.
For police and justice data, evaluation is even more important and that because
of the following reasons. Opposed to findings in exact or technical sciences,
findings in social sciences may be subject to change in the course of time. For in-
stance, Newton's laws of motion were true decades ago and do still hold today,
while the age-crime distribution in crime science is changing over time. For in-
stance, in 2000 minors were responsible for roughly 17% of the committed crimes
(that is, of all interrogated suspects, 17% was between 12 and 17 years old); while
in 2007 they were responsible for around 19% of the committed crimes. 14
Another reason to be cautious with data mining results in social sciences is the
fact that, since data collection is a time-consuming and difficult process, often leg-
acy databases are used for data mining. Such databases contain large amounts of
13 Choenni, S., Bakker, R., Blok, H. & De Laat, R. (2005), Hand, D.J. (1998), Tan, P.,
Steinbach, M. & Kumar, V. (2005).
14 De Heer-de Lange, N.E. & Kalidien, S. (2010).
Search WWH ::




Custom Search