Databases Reference
In-Depth Information
Then, notice that none of the other four cities has a copy of both Tables E and F.
That means that there is no one city to which the query can be sent for complete
processing, including the join.
In order to handle this type of distributed join situation, the distributed
DBMS must have a sophisticated ability to move data from one city to another to
accomplish the join. In Chapter 4, we described the relational DBMS's relational
query optimizer as an expert system that figures out an efficient way to respond to
and satisfy a relational query. Similarly, the distributed DBMS must have its own
built-in expert system that is capable of figuring out an efficient way to handle a
request for a distributed join. This distributed DBMS expert system will work hand
in hand with the relational query optimizer, which will still be needed to determine
which records of a particular table are needed to satisfy the join, among other things.
For the query issued from Los Angeles that requires a join of Tables E and F, there
are several options:
Figure out which records of Table E are involved in the join and send copies of
them from either Memphis or Paris (each of which has a copy of Table E) to
either New York or Tokyo (each of which has a copy of the other table involved
in the join, Table F). Then, execute the join in whichever of New York or Tokyo
was chosen to receive the records from Table E and send the result back to Los
Angeles.
Figure out which records of Table F are involved in the join and send copies of
them from either New York or Tokyo (each of which has a copy of Table F) to
either Memphis or Paris (each of which has a copy of the other table involved
in the join, Table E). Then, execute the join in whichever of Memphis or Paris
was chosen to receive the records from Table F and send the result back to Los
Angeles.
Figure out which records of Table E are involved in the join and send copies of
them from either Memphis or Paris (each of which has a copy of Table E) to
Los Angeles, the city that initiated the join request. Figure out which records of
Table F are involved in the join and send copies of them from either New York
or Tokyo (each of which has a copy of Table F) to Los Angeles. Then, execute
the join in Los Angeles, the site that issued the query.
How does the distributed DBMS decide among these options? It must consider:
The number and size of the records from each table involved in the join.
The distances and costs of transmitting the records from one city to another to
execute the join.
The distance and cost of shipping the result of the join back to the city that issued
the query in the first place.
For example, if only 20 records of Table E are involved in the join while all of
Table F is needed, then it would make sense to send copies of the 20 Table E
records to a city that has a copy of Table F. The join can then be executed at the
Table F city and the result sent back to Los Angeles. Looking at the arrangement of
tables in Figure 12.9, one solution would be to send the 20 records from Table E
in Memphis to New York, one of the cities with Table F. The query could then be
executed in New York and the result sent to Los Angeles, which issued the query.
Why Memphis and New York rather than Paris and Tokyo, the other cities that have
Search WWH ::




Custom Search