Databases Reference
In-Depth Information
1. If the output of one task τ is input to another task, then the size of τ 's
output will be accounted for when measuring the input size for the receiv-
ing task. Thus, there is no reason to count the size of any output except
for those tasks whose output forms the result of the entire algorithm.
2. In practice, the output of a job is rarely large compared with the input
or the intermediate data produced by the job. The reason is that massive
outputs cannot be used unless they are summarized or aggregated in some
way. For example, although we talked in Example 2.6 of computing the
entire transitive closure of a graph, in practice we would want something
much simpler, such as the count of the number of nodes reachable from
each node, or the set of nodes reachable from a single node.
Example 2.8 : Let us evaluate the communication cost for the join algorithm
from Section 2.3.7. Suppose we are joining R(A, B) ⊲⊳ S(B, C), and the sizes
of relations R and S are r and s, respectively. Each chunk of the files holding
R and S is fed to one Map task, so the sum of the communication costs for all
the Map tasks is r + s. Note that in a typical execution, the Map tasks will
each be executed at a compute node holding a copy of the chunk to which it
applies. Thus, no internode communication is needed for the Map tasks, but
they still must read their data from disk. Since all the Map tasks do is make a
simple transformation of each input tuple into a key-value pair, we expect that
the computation cost will be small compared with the communication cost,
regardless of whether the input is local to the task or must be transported to
its compute node.
The sum of the outputs of the Map tasks is roughly as large as their in-
puts. Each output key-value pair is sent to exactly one Reduce task, and it is
unlikely that this Reduce task will execute at the same compute node. There-
fore, communication from Map tasks to Reduce tasks is likely to be across the
interconnect of the cluster, rather than memory-to-disk. This communication
is O(r + s), so the communication cost of the join algorithm is O(r + s).
Observe that the Reduce tasks can use a hash join of the tuples received.
This process involves hashing each of the tuples received on their B-values,
using a different hash function from the one that divided the tuples among
Reduce tasks. The local hash join takes time that is linear in the number of
tuples received, and thus is also O(r + s). We do not count this execution
time in the communication-cost model, but it is comforting to know that the
computation cost is surely not the dominant factor for this algorithm.
The output size for the join can be either larger or smaller than r + s,
depending on how likely it is that a given R-tuple joins with a given S-tuple.
For example, if there are many different B-values, we would expect the output
to be small, while if there are few B-values, a large output is likely. However,
we shall rely on our supposition that if the output of the join is large, then
there is probably some aggregation being done to reduce the size of the output.
This aggregation typically can be executed by the Reduce tasks as they produce
their output.
2
Search WWH ::




Custom Search