Databases Reference
In-Depth Information
DISADVANTAGES OF DISTRIBUTED DATABASES
Distributed databases have the following disadvantages:
Update of replicated data. Replicating data can improve processing speed and ensure that the
overall system remains available even when the database at one site is unavailable. However, rep-
lication can cause update problems, most obviously in terms of the extra time needed to update
all the copies. Instead of updating a single copy of the data, the DBMS must update several
copies. Because most of these copies are at sites other than the site initiating the update, each
update transaction requires extra time to update each copy and extra time to communicate all the
update messages over the network.
Replicated data causes another, slightly more serious problem. Assume an update transac-
tion must update data that is replicated at five sites and that the fifth site is currently unavailable.
If all updates must be made or none at all, the update transaction fails. Because the data at a single
site is unavailable for update, that data is unavailable for update at all sites. This situation cer-
tainly contradicts the earlier advantage of increased system availability. On the other hand, if you
do not require that all updates be made, the data will be inconsistent.
Often a DDBMS uses a compromise strategy. The DDBMS designates one copy of the data to
be the primary copy . As long as the primary copy is updated, the DDBMS considers the update
to be complete. The primary site and the DDBMS must ensure that all the other copies are in sync.
The primary site sends update transactions to the other sites and notes whether any sites are cur-
rently unavailable. If a site is unavailable, the primary site must try to send the update again
at some later time and continue trying until it succeeds. This strategy overcomes the basic prob-
lem, but it obviously uses more time. Further, if the primary site is unavailable, the problem
remains unresolved.
More complex query processing. Processing queries is more complex in a distributed database. The
complexity occurs due to the difference in the time it takes to send messages between sites and the
time it takes to access a disk. As discussed earlier, minimizing message traffic is extremely impor-
tant in a distributed database environment. To illustrate the complexity involved with query process-
ing, consider the following query for Premiere Products: List all parts in item class SG with a price
that is more than $500.00. For this query, assume (1) the Part table contains 1,000 rows and is stored
at a remote site; (2) each record in the Part table is 500 bits long; (3) there is no special structure,
such as an index, that would be helpful in processing this query faster; and (4) only 10 of the
1,000 rows in the Part table satisfy the conditions. How would you process this query?
One query strategy involves retrieving each row from the remote site and examining the item
class and price to determine whether the row should be included in the result. For each row, this
solution requires two messages. The first is a message from the local site to the remote site
requesting a row. It is followed by the second message, which is from the remote site to the local
site containing the data or, ultimately, an indication that there is no more data because you have
retrieved every row in the table. Thus, in addition to the database accesses, this strategy
requires 2,000 messages. Once again, suppose you have a network with an access delay of 2 sec-
onds and a transmission rate of 750,000 bits per second. Based on the calculations for commu-
nication time earlier in this chapter, each message requires approximately 2 seconds. You calculate
the communication time for this query strategy as follows:
Communication time=2*2,000
= 4,000 seconds, or 66.7 minutes
A second query strategy involves sending a single message from the local site to the remote
site requesting the complete answer to the query. The remote site examines each row in the table
and finds the 10 rows that satisfy the query. The remote site then sends a single message back to
the local site containing all 10 rows in the answer. You calculate the communication time for this
query strategy as follows:
Communication time=2+(2+((10 * 500) / 750,000))
=2+(2+(5000 / 750,000)
=2+(2+0.006)
= 4.006 seconds
280
 
Search WWH ::




Custom Search