Database Reference
In-Depth Information
INTRODUCTION
Databases hold data that represent properties of real-world objects. Ideally, a set of real-world objects
can be described by the constructs of a single data model and stored in one and only one database.
Nevertheless, in reality, one can usually find two or more databases storing information about the same
real-world objects. There are several reasons that result in the overlapping representations. These include:
Different roles played by the same real-world objects in different applications. For example, a
company can be the customer as well as the supplier for a firm. Hence, the company's information
can be found in both the customers' database and supplier's database.
For performance reasons, a piece of information may be fully or partially duplicated and stored
in databases at different geographical locations. For example, the customers' information may be
stored in both the branches and headquarter.
Different ownership of information can also lead to information stored in different databases. For
example, the information of a raw material item may be stored in different production databases
because each production line wants to own a copy of the information and to exercise control over
the information.
When two or more databases represent overlapping sets of real world objects, there is a strong need
to integrate these databases in order to support applications of cross- functional information systems. It
is therefore important to examine strategies for database integration. An important aspect of database
integration is the definition of a global schema that captures the description of the combined (or inte-
grated) database. Here, we define schema integration to be the process of merging schemas of databases,
and instance integration to be the process of integrating the database instances. Schema integration is a
problem well studied by database researchers (Batini, Lenzerini, and Navade, 1986; Hayne and Ram,
1990; Kaul, Drosten, and Neuhold, 1990; Larson, Navade and Elmasari, 1989; Spaccapietra, Parent
and Dupont, 1992). The solution approaches identify the correspondences between schema constructs
(e.g. entity types, attributes, etc.) from different databases and resolve their differences. The end result
is a global schema which describes the integrated database. In contrast, instance integration focuses on
merging the actual values found in instances from different databases. There are two major problems
in instance integration:
a. entity identification; and
b. attribute value conflict resolution
The entity identification problem involves matching data instances that represent the same real-world
objects. The attribute value conflict resolution problem involves merging the values of matching data
instances. These two problems have been studied in (Chatterjee and Segev, 1991; Lim, Srivastava, Prab-
hakar and Richardson, 1993; Wang and Madnick, 1989) and (DeMichiel 1989; Lim, Srivastava, Prabhakar
and Richardson, 1993; Lim, Srivastava and Shekhar, 1994; Tasi and Chen, 1993) respectively. It is not
possible to have attribute value conflicts resolved without entity identification because attribute value
conflict resolution can only be done for matching data instances. In defining the integrated database, one
has to choose a global data model so that the global schema can be described by the constructs provided
by the data model. The queries that can be formulated against the integrated database also depend on
Search WWH ::




Custom Search