Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

6.10 [Discussion/15] <6.4, 6.5> Discuss some options to beter utilize the excess servers during

the off-peak hours or options to save costs. Given the interactive nature of WSCs, what are

some of the challenges to aggressively reducing power usage?

6.11 [Discussion/25] <6.4, 6.6> Propose one possible way to improve TCO by focusing on re-

ducing server power. What are the challenges to evaluating your proposal? Estimate the

TCO improvements based on your proposal. What are advantages and drawbacks?

Exercises

6.12 [10/10/10] <6.1> One of the important enablers of WSC is ample request-level parallelism,

in contrast to instruction or thread-level parallelism. This question explores the implication

of different types of parallelism on computer architecture and system design.

a. [10] <6.1> Discuss scenarios where improving the instruction- or thread-level parallel-

ism would provide greater benefits than achievable through request-level parallelism.

b. [10] <6.1> What are the software design implications of increasing request-level paral-

lelism?

c. [10] <6.1> What are potential drawbacks of increasing request-level parallelism?

6.13 [Discussion/15/15] <6.2> When a cloud computing service provider receives jobs consist-

ing of multiple Virtual Machines (VMs) (e.g., a MapReduce job), many scheduling options

exist. The VMs can be scheduled in a round-robin manner to spread across all available

processors and servers or they can be consolidated to use as few processors as possible.

Using these scheduling options, if a job with 24 VMs was submited and 30 processors were

available in the cloud (each able to run up to 3 VMs), round-robin would use 24 processors,

while consolidated scheduling would use 8 processors. The scheduler can also find avail-

able processor cores at different scopes: socket, server, rack, and an array of racks.

a. [Discussion] <6.2> Assuming that the submited jobs are all compute-heavy work-

loads, possibly with different memory bandwidth requirements, what are the pros

and cons of round-robin versus consolidated scheduling in terms of power and cool-

ing costs, performance, and reliability?

b. [15] <6.2> Assuming that the submited jobs are all I/O-heavy workloads, what are the

pros and cons of round-robin versus consolidated scheduling, at different scopes?

c. [15] <6.2> Assuming that the submited jobs are network-heavy workloads, what are

the pros and cons of round-robin versus consolidated scheduling, at different scopes?

6.14 [15/15/10/10] <6.2, 6.3> MapReduce enables large amounts of parallelism by having data-

independent tasks run on multiple nodes, often using commodity hardware; however,

there are limits to the level of parallelism. For example, for redundancy, MapReduce will

write data blocks to multiple nodes, consuming disk and potentially network bandwidth.

Assume a total dataset size of 300 GB, a network bandwidth of 1 Gb/sec, a 10 sec/GB map

rate, and a 20 sec/GB reduce rate. Also assume that 30% of the data must be read from re-

mote nodes, and each output ile is writen to two other nodes for redundancy. Use Figure

6.6 for all other parameters.

a. [15] <6.2, 6.3> Assume that all nodes are in the same rack. What is the expected

runtime with 5 nodes? 10 nodes? 100 nodes? 1000 nodes? Discuss the botlenecks at

each node size.

Search WWH ::

Custom Search

Home