Databases Reference
In-Depth Information
1
1
N AT
RLS
MRP
N AT
RLS
MRP
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Q5 (HJ)
Q7 (REPJ)
Q9 (HJ)
Q17 (COG)
5
10
15
20
25
Query
Cluster size
(b)
Fig. 2. Percentage of transferred data for a) different type of queries b) varying cluster
and data size
(a)
depending on the cluster size, so that each node is assigned 2GB of data. Fig 2(b)
shows the percentage of transferred data for the three approaches, while increas-
ing the number of cluster nodes. As shown, with increasing the number of nodes,
our approach maintains a steady data locality, but it decreases for the other ap-
proaches. Since there is no skew in key frequencies, both native Hadoop and RLS
obtain data localities near 1 divided by the number of nodes. Our experiments
with different data sizes for the same cluster size show no modification in the
percentage of transferred data for MR-Part (the results are not shown in the
paper due to space restrictions).
Response Time. As shown in previous subsection, MR-Part can significantly
reduce the amount of transferred data in the shue phase. However, its impact
on response time strongly depends on the network bandwidth. In this section, we
measure the effect of MR-Part on MapReduce response time by varying network
bandwidth. We control point-to-point bandwidth by using Linux tc command
line utility. We execute query Q5 on a cluster of 20 nodes with scale factor of 40
(40GB of dataset total size).
The results are shown in Fig 3. As we can see in Fig 3 (a), the slower is the
network, the biggest is the impact of data locality on execution time. To show
where the improvement is produced, in Fig 3 (b) we report the time spent in data
shuing. Measuring shue time is not straightforward since in native Hadoop it
starts once 5% of map tasks have finished and proceeds in parallel while they are
completed. Because of that, we represent two lines: NAT-ms that represents the
time spent since the first shue byte is sent until this phase is completed, and
NAT-os that represents the period of time where the system is only dedicated
to shuing (after last map finishes). For MR-Part only the second line has to be
represented as the system has to wait for all map tasks to complete in order to
schedule reduce tasks. We can observe that, while shue time is almost constant
for MR-Part, regardless of the network conditions, it increases significantly as
the network bandwidth decreases for the other alternatives. As a consequence,
the response time for MR-Part is less sensitive to the network bandwidth than
that of native Hadoop. For instance, for 10mbps, MR-Part executes in around
30% less time than native Hadoop.
 
Search WWH ::




Custom Search