Databases Reference
In-Depth Information
But if the nature of the data and of the applications that use it require all of
the data in the replicated tables worldwide always to be consistent, accurate, and
up-to-date, then a more complex ''synchronous'' procedure must be put in place.
While there are variations on this theme, the basic process for accomplishing this is
known as the '' two-phase commit .'' The two-phase commit works like this. Each
computer on the network has a special log file in addition to its database tables.
So, in Figure 12.9, each of the five cities has one of these special log files. Now,
when an update is to be made at one site, the distributed DBMS has to do several
things. It has to freeze all the replicated copies of the table involved, send the update
out to all the sites with the table copies, and then be sure that all the copies were
updated. After all of that happens, all of the replicated copies of the table will have
been updated and processing can resume. Remember that, for this to work properly,
either all of the replicated files must be updated or none of them must be updated.
What we don't want is for the update to take place at some of the sites and not at
the others, since this would obviously leave inconsistent results.
Let's look at an example using Table D in Figure 12.9. Copies of Table D are
located in Los Angeles, Memphis, and Paris. Say that someone issues an update
request to a record in Table D in Memphis. In the first or ''prepare'' phase of the
two-phase commit, the computer in Memphis sends the updated data to Los Angeles
and Paris. The computers in all three cities write the update to their logs (but not to
their actual copies of Table D at this point). The computers in Los Angeles and Paris
attempt to lock their copies of Table D to get ready for the update. If another process
is using their copy of Table D then they will not be able to do this. Los Angeles and
Paris then report back to Memphis whether or not they are in good operating shape
and whether or not they were able to lock Table D. The computer in Memphis takes
in all of this information and then decides whether to continue with the update or
to abort it. If Los Angeles and Paris report back that they are up and running and
were able to lock Table D, then the computer in Memphis will decide to go ahead
with the update. If the news from Los Angeles and Paris was bad, Memphis will
decide not to go ahead with the update. So, in the second or ''commit'' phase of
the two-phase commit, Memphis sends its decision to Los Angeles and Paris. If it
decides to complete the update, then all three cities transfer the updated data from
their logs to their copy of Table D. If it decides to abort the update, then none of
the sites transfer the updated data from their logs to their copy of Table D. All three
copies of Table D remain as they were and Memphis can start the process all over
again.
The two-phase commit is certainly a complex, costly, and time-consuming
process. It should be clear that the more volatile the data in the database is, the less
attractive is this type of synchronous procedure for updating replicated tables in the
distributed database.
Distributed Joins
Let'stakealookattheissueof distributed joins , which came up earlier. In a
distributed database in which no single computer (no single city) in the network
contains the entire database, there is the possibility that a query will be run from
one computer requiring a join of two or more tables that are not all at the same
computer. Consider the distributed database design in Figure 12.9. Let's say that a
query is issued at Los Angeles that requires the join of Tables E and F. First of all,
neither of the two tables is located at Los Angeles, the site that issued the query.
Search WWH ::




Custom Search