Database Reference
In-Depth Information
case is taken as the classification criterion. As a result, less severe offences 'dis-
appear' in the data reported by the court.
It is important that existing semantic dependencies between attributes (if any)
are preserved while integrating data. Therefore, in a data warehouse domain
experts need to keep track of semantic dependencies. In a dataspace these may be
captured in rules.
Resolving inconsistencies
The different judicial databases have overlapping or redundant attributes. Redun-
dancy may introduce inconsistencies that have to be detected and solved manually
based on domain expertise. Take for example the nationality of a suspect that is
recorded by different organizations. It is known that, in practice, foreigners tend to
provide a wrong nationality when they are not able to show identification papers.
As a result, inconsistencies may arise between different databases of different or-
ganizations. This can be resolved by utilizing the domain knowledge.
Prior to loading data into a data warehouse, inconsistencies have to be indenti-
fied and resolved. This means that all values of overlapping or redundant
attributes have to be in agreement with each other. In a dataspace approach incon-
sistencies can be detected automatically and on the fly using rules that check
attributes coming from different sources.
Handling semantic changes
Data evolve over time as rules and regulations are changing. Therefore, certain
values on certain attributes may have gotten a different meaning over time. For
instance, due to municipal reorganizations in the Netherlands, names of munici-
palities and cities have changed, while the old registered names were not always
updated. Over time, the meaning of the old names may become unknown. Moreo-
ver, in case cities are expanded, their names mean something different before the
reorganization than after. If these changes are not recorded, data may be combined
improperly or wrong conclusions may be drawn based on them. To keep track of
the 'history' of the attributes, semantic changes have to be recorded. In a datas-
pace this can be done in the relationship manager.
Concluding example
In general, a dataspace approach may be considered to be more efficient and prac-
tical than a data warehouse approach, because in the former it is easier to combine
data and add new sources, as there is no need for data reconciliation. Additionally,
using a dataspace approach dependencies, inconsistencies, and changes can be
managed more effectively.
As an illustration, assume that one wants to know how many of the suspects
questioned by the police are handed over to the prosecution and how many of
them are actually prosecuted. To answer this question, the databases of the police
(HKS) and the prosecution (OM-data) have to be integrated. However, OM-data
only contains data of cases that are handled by the prosecution. This means that
not all individuals in HKS are present in OM-data and, therefore, combining on an
individual level, which is needed in a data warehouse approach, is impossible for
Search WWH ::




Custom Search