Distributed Data Allocation - Physical Database Design

Databases Reference

In-Depth Information

2.

Heterogeneity: This means the system can accommodate different hardware,

network protocols, data models, query languages, and query capabilities. They

might be as similar as two versions of Oracle or SQL Server, or as diverse as

relational databases, websites running XML, or special applications with other

types of databases.

3.

Autonomy: There is an absence of restrictions being enforced at the remote data

source, thus allowing it to remain autonomous. It is highly desirable to have the

federated database system not change the local operation of an existing data

source.

4.

High degree of function: A federated database system should allow applications

to exploit not only the high degree of function provided by the federated sys-

tem, but also the special functions unique to the variety of individual data

sources. Typical federated systems run on SQL to make it easy to use relative to

the individual local systems.

5.

Extensibility and openness: Federated systems need to be able to evolve over time,

and thus need the flexibility to seamlessly add new data sources to the enter-

prise. A wrapper module is used to provide the data access logic for each data

source. In fact, it is common to supply wrappers for a set of known data sources

like Oracle, Sybase, and XML files, plus some generic ones like Open Database

Connectivity (ODBC). The IBM WebSphere Federation Server provides a

wrapper development kit so customers can write their own wrappers to their

own proprietary data sources that cannot be accessed by the native wrappers.

6.

Optimized performance: The query optimizer of a relational database system is

the component that determines the most efficient way to answer a given query.

In a federated system the optimizer must also determine whether the different

operations in a query (join, select, union, etc.) should be done by the federated

server or by the local system at each data source. To do this, the optimizer not

only needs to have a cost model for each data source as well as the overall net-

work, but also it tries to figure out whether the query semantics are identical if a

query operation is pushed down (in a query rewrite) versus whether the opera-

tion is performed locally. The latter decision is based on information specific to

the data source. Once an operation is identified to be remotely executable, then

the optimizer can use cost-based optimization to determine whether pushing it

down is the right decision.

None of these objectives explicitly depends on a data allocation strategy. In feder-

ated systems, data allocation (or data distribution) is done largely at the discretion of the

database designer or database administrator. Sometimes it involves homogeneous data

sources, but usually it is heterogeneous (DB2, Oracle, SQL Server, etc.). Once the allo-

cation decision has been made and the replicated data loaded into the system, the feder-

Physical Database Design

Search WWH ::

Custom Search

Home