Database Reference
In-Depth Information
4.2
Application: Categorial Semantics for Data
Integration/Exchange
Data exchange [ 4 ] is a problem of taking data structured under a source schema and
creating an instance of a target schema that reflects the source data as accurately as
possible.
Data integration [ 10 ] instead is a problem of combining data residing at different
sources, and providing the user with a unified global schema of this data.
We describe the architecture of a tool for semantic data integration (depicted in
Fig. 4.1 ), based on commercial systems for information integration, and explain its
behavior. Such a system can be view as a plug-in that is able to cooperate with the
commercial systems for data federation in order to achieve effective data integration.
This architecture of a system is able to manage complex integration environments
with a number of heterogeneous sources being integrated under a common view of
it. The system provides the user with a classical database interface through which the
sources are transparently integrated and queried. The independence and autonomy
of data sources are also preserved by means of a fully virtual approach.
In order to deal with the information integration problem, they mainly act as
data federation tool, enabling users to access remote data sources as if they were
contained in the local database. However, they do not let the designer define an
arbitrary description of the domain of interest. In particular, there are two main
limitations affecting the effective employability of the typical data federation tools:
External data source can be modeled as aliases that can be queried as if they were
local tables. However, the correspondence between remote data sources and local
alias is always one-to-one, instead of letting the designer define more expressive
correspondence between a virtual concept of interest and, for example, a view or
a query over multiple source data sets.
Furthermore, if the sources are relational, the aliases' schema are identical to
those of the modeled data source relations, which means that both have the same
number, name and type of attributes. Even if sometimes the definition of aliases
is a little more flexible for non-relational data sources, there are still several rules
that the designer has to follow, which limit the expressiveness of the correspon-
dence even inside a single source data set.
The above mentioned limits are typical of data federation tools. Indeed, in this kind
of tools, the designer is simply provided with a view of data sources that is strongly
source-dependent, since it basically reflects the source structure.
The commercial solutions for Data Integration can use the following features:
Database Federation Tools allow for seeing multiple databases as a single re-
source;
Heterogeneous Source Access capability to access simultaneously relational and
non-relational-data;
Grid Computing Tools overcoming the sources location problem.
Some of experimental tools, based on query rewriting, which extend the limited
features of Database Federation Tools, resolving the problems of consistent query
answering in the Data integration framework with key, exclusion and inclusion de-
pendencies over global schema, can be found in [ 3 ].
Search WWH ::




Custom Search