Distributed Data Allocation - Physical Database Design

Databases Reference

In-Depth Information

Both I/O time and network delay time for these data allocation methods can be

accurately estimated using the simple mathematical estimation models given in

Appendix A.

TIPS AND INSIGHTS FOR DATABASE PROFESSIONALS

•

Tip 1. Determine when to replicate data across a network on a simple basis of

cost and benefits. Benefits occur when you can save network delay time and I/O

time by placing a copy of the data closer to the source of a query. Costs occur

when the extra copy of data must be updated every time the original data is

updated. Benefits and costs are both estimated in terms of elapsed time (network

delay time plus I/O time). In general you want to add a copy of data to a site when

the benefit exceeds the cost.

•

Tip 2. When benefits and costs are approximately equal, decide whether to

replicate data based on greater availability. When you have multiple copies of

data, the availability of data is greater. This could be a major concern when remote

sites go down unexpectedly and you have high-priority queries to satisfy. Analyze

the benefits of greater availability and make your decision about data replication

based on both tangible and intangible benefits.

•

Tip 3. When the workload is very complex, use the dominant transaction

approach. One of the disadvantages of the data allocation methods presented in

this chapter is the use of averages for query times and update times. This does not

take into account the possibility of dominant transactions whose I/O specifica-

tions are known, and in an environment where the network configuration and

network protocol details are given. Under such circumstances, the actual I/O

times and network delay times can be estimated for individual dominant transac-

tions instead of averages across all transactions. A dominant transaction is defined

by criteria such as high frequency of execution, high volume of data accessed, tight

response time constraints, and explicit priority.

16.5

Summary

Distributed database design requires one more step of analysis than centralized databases,

but there exists a set of basic principles we can use for everyday design decisions. Repli-

cated data allocation methods can be simply expressed and implemented to minimize the

time to execute a collection of transactions on a distributed database. The methods take

into account the execution times of remote and local database transactions for query and

update and the frequencies of these transactions. Good estimating techniques for average

I/O time and network delay costs can easily be applied to these methods.

Search WWH ::

Custom Search

Home