An Extended Relational Model & SQL for Fuzzy Multidatabases - Advanced Database Query Systems

Database Reference

In-Depth Information

INTRODUCTION

Databases hold data that represent properties of real-world objects. Ideally, a set of real-world objects

can be described by the constructs of a single data model and stored in one and only one database.

Nevertheless, in reality, one can usually find two or more databases storing information about the same

real-world objects. There are several reasons that result in the overlapping representations. These include:

•

Different roles played by the same real-world objects in different applications. For example, a

company can be the customer as well as the supplier for a firm. Hence, the company's information

can be found in both the customers' database and supplier's database.

•

For performance reasons, a piece of information may be fully or partially duplicated and stored

in databases at different geographical locations. For example, the customers' information may be

stored in both the branches and headquarter.

•

Different ownership of information can also lead to information stored in different databases. For

example, the information of a raw material item may be stored in different production databases

because each production line wants to own a copy of the information and to exercise control over

the information.

When two or more databases represent overlapping sets of real world objects, there is a strong need

to integrate these databases in order to support applications of cross- functional information systems. It

is therefore important to examine strategies for database integration. An important aspect of database

integration is the definition of a global schema that captures the description of the combined (or inte-

grated) database. Here, we define schema integration to be the process of merging schemas of databases,

and instance integration to be the process of integrating the database instances. Schema integration is a

problem well studied by database researchers (Batini, Lenzerini, and Navade, 1986; Hayne and Ram,

1990; Kaul, Drosten, and Neuhold, 1990; Larson, Navade and Elmasari, 1989; Spaccapietra, Parent

and Dupont, 1992). The solution approaches identify the correspondences between schema constructs

(e.g. entity types, attributes, etc.) from different databases and resolve their differences. The end result

is a global schema which describes the integrated database. In contrast, instance integration focuses on

merging the actual values found in instances from different databases. There are two major problems

in instance integration:

a. entity identification; and

b. attribute value conflict resolution

The entity identification problem involves matching data instances that represent the same real-world

objects. The attribute value conflict resolution problem involves merging the values of matching data

instances. These two problems have been studied in (Chatterjee and Segev, 1991; Lim, Srivastava, Prab-

hakar and Richardson, 1993; Wang and Madnick, 1989) and (DeMichiel 1989; Lim, Srivastava, Prabhakar

and Richardson, 1993; Lim, Srivastava and Shekhar, 1994; Tasi and Chen, 1993) respectively. It is not

possible to have attribute value conflicts resolved without entity identification because attribute value

conflict resolution can only be done for matching data instances. In defining the integrated database, one

has to choose a global data model so that the global schema can be described by the constructs provided

by the data model. The queries that can be formulated against the integrated database also depend on

Search WWH ::

Custom Search

Home