DATABASE MANAGEMENT APPROACHES - Concepts of Database Management

Databases Reference

In-Depth Information

DISADVANTAGES OF DISTRIBUTED DATABASES

Distributed databases have the following disadvantages:

● Update of replicated data. Replicating data can improve processing speed and ensure that the

overall system remains available even when the database at one site is unavailable. However, rep-

lication can cause update problems, most obviously in terms of the extra time needed to update

all the copies. Instead of updating a single copy of the data, the DBMS must update several

copies. Because most of these copies are at sites other than the site initiating the update, each

update transaction requires extra time to update each copy and extra time to communicate all the

update messages over the network.

Replicated data causes another, slightly more serious problem. Assume an update transac-

tion must update data that is replicated at five sites and that the fifth site is currently unavailable.

If all updates must be made or none at all, the update transaction fails. Because the data at a single

site is unavailable for update, that data is unavailable for update at all sites. This situation cer-

tainly contradicts the earlier advantage of increased system availability. On the other hand, if you

do not require that all updates be made, the data will be inconsistent.

Often a DDBMS uses a compromise strategy. The DDBMS designates one copy of the data to

be the primary copy . As long as the primary copy is updated, the DDBMS considers the update

to be complete. The primary site and the DDBMS must ensure that all the other copies are in sync.

The primary site sends update transactions to the other sites and notes whether any sites are cur-

rently unavailable. If a site is unavailable, the primary site must try to send the update again

at some later time and continue trying until it succeeds. This strategy overcomes the basic prob-

lem, but it obviously uses more time. Further, if the primary site is unavailable, the problem

remains unresolved.

● More complex query processing. Processing queries is more complex in a distributed database. The

complexity occurs due to the difference in the time it takes to send messages between sites and the

time it takes to access a disk. As discussed earlier, minimizing message traffic is extremely impor-

tant in a distributed database environment. To illustrate the complexity involved with query process-

ing, consider the following query for Premiere Products: List all parts in item class SG with a price

that is more than $500.00. For this query, assume (1) the Part table contains 1,000 rows and is stored

at a remote site; (2) each record in the Part table is 500 bits long; (3) there is no special structure,

such as an index, that would be helpful in processing this query faster; and (4) only 10 of the

1,000 rows in the Part table satisfy the conditions. How would you process this query?

One query strategy involves retrieving each row from the remote site and examining the item

class and price to determine whether the row should be included in the result. For each row, this

solution requires two messages. The first is a message from the local site to the remote site

requesting a row. It is followed by the second message, which is from the remote site to the local

site containing the data or, ultimately, an indication that there is no more data because you have

retrieved every row in the table. Thus, in addition to the database accesses, this strategy

requires 2,000 messages. Once again, suppose you have a network with an access delay of 2 sec-

onds and a transmission rate of 750,000 bits per second. Based on the calculations for commu-

nication time earlier in this chapter, each message requires approximately 2 seconds. You calculate

the communication time for this query strategy as follows:

Communication time=2*2,000

= 4,000 seconds, or 66.7 minutes

A second query strategy involves sending a single message from the local site to the remote

site requesting the complete answer to the query. The remote site examines each row in the table

and finds the 10 rows that satisfy the query. The remote site then sends a single message back to

the local site containing all 10 rows in the answer. You calculate the communication time for this

query strategy as follows:

Communication time=2+(2+((10 * 500) / 750,000))

=2+(2+(5000 / 750,000)

=2+(2+0.006)

= 4.006 seconds

280

Concepts of Database Management

Search WWH ::

Custom Search

Home