Database Reference
In-Depth Information
DISADVANTAGES OF DISTRIBUTED DATABASES
Distributed databases have the following disadvantages:
￿
Update of replicated data. Replicating data can improve processing speed and ensure that the
overall system remains available even when the database at one site is unavailable. However,
replication can cause update problems, most obviously in terms of the extra time needed to
update all the copies. Instead of updating a single copy of the data, the DBMS must update sev-
eral copies. Because most of these copies are at sites other than the site initiating the update,
each update transaction requires extra time to update each copy and extra time to communicate
all the update messages over the network.
Replicated data causes another, slightly more serious problem. Assume an update transaction
must update data that is replicated at five sites and that the fifth site is currently unavailable. If all
updates must be made or none at all, the update transaction fails. Because the data at a single site
is unavailable for update, that data is unavailable for update at all sites. This situation certainly
contradicts the earlier advantage of increased system availability. On the other hand, if you do not
require that all updates be made, the data will be inconsistent.
Often a DDBMS uses a compromise strategy. The DDBMS designates one copy of the data to
be the primary copy. As long as the primary copy is updated, the DDBMS considers the update to be
complete. The primary site and the DDBMS must ensure that all the other copies are in sync. The
primary site sends update transactions to the other sites and notes whether any sites are currently
unavailable. If a site is unavailable, the primary site must try to send the update again at some later
time and continue trying until it succeeds. This strategy overcomes the basic problem, but it obviously
uses more time. Further, if the primary site is unavailable, the problem remains unresolved.
282
￿
More complex query processing. Processing queries is more complex in a distributed database.
The complexity occurs because of the difference in the time it takes to send messages between
sites and the time it takes to access a disk. As discussed earlier, minimizing message traffic is
extremely important in a distributed database environment. To illustrate the complexity
involved with query processing, consider the following query for Premiere Products: List all parts
in item class SG with a price that is more than $500.00. For this query, assume (1) the Part
table contains 1,000 rows and is stored at a remote site; (2) each record in the Part table is 500
bits long; (3) there is no special structure, such as an index, that would be helpful in processing
this query faster; and (4) only 10 of the 1,000 rows in the Part table satisfy the conditions. How
would you process this query?
One query strategy involves retrieving each row from the remote site and examining the
item class and price to determine whether the row should be included in the result. For each
row, this solution requires two messages. The first is a message from the local site to the remote
site requesting a row. It is followed by the second message, which is from the remote site to the
local site, containing the data or, ultimately, an indication that there is no more data because
you have retrieved every row in the table. Thus, in addition to the database accesses, this
strategy requires 2,000 messages. Once again, suppose you have a network with an access delay
of 2 seconds and a transmission rate of 750,000 bits per second. Based on the calculations for
communication time earlier in this chapter, each message requires approximately 2 seconds.
You calculate the communication time for this query strategy as follows:
Communication time ¼ 2 * 2,000
¼ 4,000 seconds, or 66.7 minutes
A second query strategy involves sending a single message from the local site to the remote
site, requesting the complete answer to the query. The remote site examines each row in the
table and finds the ten rows that satisfy the query. The remote site then sends a single message
back to the local site, containing all 10 rows in the answer. You calculate the communication
time for this query strategy as follows:
Communication time ¼ 2 þ (2 þ ((10 * 500) / 750,000))
¼ 2 þ (2 þ (5000 / 750,000)
¼ 2 þ (2 þ 0.006)
¼ 4.006 seconds
Search WWH ::




Custom Search