DATABASE MANAGEMENT APPROACHES - Concepts of Database Management

Database Reference

In-Depth Information

DISADVANTAGES OF DISTRIBUTED DATABASES

Distributed databases have the following disadvantages:

Update of replicated data. Replicating data can improve processing speed and ensure that the

overall system remains available even when the database at one site is unavailable. However,

replication can cause update problems, most obviously in terms of the extra time needed to

update all the copies. Instead of updating a single copy of the data, the DBMS must update sev-

eral copies. Because most of these copies are at sites other than the site initiating the update,

each update transaction requires extra time to update each copy and extra time to communicate

all the update messages over the network.

Replicated data causes another, slightly more serious problem. Assume an update transaction

must update data that is replicated at five sites and that the fifth site is currently unavailable. If all

updates must be made or none at all, the update transaction fails. Because the data at a single site

is unavailable for update, that data is unavailable for update at all sites. This situation certainly

contradicts the earlier advantage of increased system availability. On the other hand, if you do not

require that all updates be made, the data will be inconsistent.

Often a DDBMS uses a compromise strategy. The DDBMS designates one copy of the data to

be the primary copy. As long as the primary copy is updated, the DDBMS considers the update to be

complete. The primary site and the DDBMS must ensure that all the other copies are in sync. The

primary site sends update transactions to the other sites and notes whether any sites are currently

unavailable. If a site is unavailable, the primary site must try to send the update again at some later

time and continue trying until it succeeds. This strategy overcomes the basic problem, but it obviously

uses more time. Further, if the primary site is unavailable, the problem remains unresolved.

282

More complex query processing. Processing queries is more complex in a distributed database.

The complexity occurs because of the difference in the time it takes to send messages between

sites and the time it takes to access a disk. As discussed earlier, minimizing message traffic is

extremely important in a distributed database environment. To illustrate the complexity

involved with query processing, consider the following query for Premiere Products: List all parts

in item class SG with a price that is more than $500.00. For this query, assume (1) the Part

table contains 1,000 rows and is stored at a remote site; (2) each record in the Part table is 500

bits long; (3) there is no special structure, such as an index, that would be helpful in processing

this query faster; and (4) only 10 of the 1,000 rows in the Part table satisfy the conditions. How

would you process this query?

One query strategy involves retrieving each row from the remote site and examining the

item class and price to determine whether the row should be included in the result. For each

row, this solution requires two messages. The first is a message from the local site to the remote

site requesting a row. It is followed by the second message, which is from the remote site to the

local site, containing the data or, ultimately, an indication that there is no more data because

you have retrieved every row in the table. Thus, in addition to the database accesses, this

strategy requires 2,000 messages. Once again, suppose you have a network with an access delay

of 2 seconds and a transmission rate of 750,000 bits per second. Based on the calculations for

communication time earlier in this chapter, each message requires approximately 2 seconds.

You calculate the communication time for this query strategy as follows:

Communication time ¼ 2 * 2,000

¼ 4,000 seconds, or 66.7 minutes

A second query strategy involves sending a single message from the local site to the remote

site, requesting the complete answer to the query. The remote site examines each row in the

table and finds the ten rows that satisfy the query. The remote site then sends a single message

back to the local site, containing all 10 rows in the answer. You calculate the communication

time for this query strategy as follows:

Communication time ¼ 2 þ (2 þ ((10 * 500) / 750,000))

¼ 2 þ (2 þ (5000 / 750,000)

¼ 2 þ (2 þ 0.006)

¼ 4.006 seconds

Concepts of Database Management

Search WWH ::

Custom Search

Home