Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

a. [15] <6.1, 6.2, 6.8> How many servers are required to achieve that SLA, assuming

that the WSC receives 30,000 queries per second, and the query-response time curve

shown in Figure 6.24 ? How many “small” servers are required to achieve that SLA,

given this response-time probability curve? Looking only at server costs, how much

cheaper must the “wimpy” servers be than the normal servers to achieve a cost ad-

vantage for the target SLA?

b. [15] <6.1, 6.2, 6.8> Often “small” servers are also less reliable due to cheaper compon-

ents. Using the numbers from Figure 6.1 , assume that the number of events due to

laky machines and bad memories increases by 30%. How many “small” servers are

required now? How much cheaper must those servers be than the standard servers?

c. [20] <6.1, 6.2, 6.8> Now assume a batch processing environment. The “small” servers

provide 30% of the overall performance of the regular servers. Still assuming the re-

liability numbers from Exercise 6.15 part (b), how many “wimpy” nodes are required

to provide the same expected throughput of a 2400-node array of standard servers, as-

suming perfect linear scaling of performance to node size and an average task length

of 10 minutes per node? What if the scaling is 85%? 60%?

d. [15] <6.1, 6.2, 6.8> Often the scaling is not a linear function, but instead a logarithmic

function. A natural response may be instead to purchase larger nodes that have more

computational power per node to minimize the array size. Discuss some of the trade-

ofs with this architecture.

FIGURE 6.24 Query-response time curve .

6.17 [10/10/15] <6.3, 6.8> One trend in high-end servers is toward the inclusion of nonvolatile

Flash memory in the memory hierarchy, either through solid-state disks (SSDs) or PCI

Express-atached cards. Typical SSDs have a bandwidth of 250 MB/sec and latency of 75

μ s, whereas PCIe cards have a bandwidth of 600 MB/sec and latency of 35 μ s.

a. [10] Take Figure 6.7 and include these points in the local server hierarchy. Assuming

that identical performance scaling factors as DRAM are accessed at different hierarchy

levels, how do these Flash memory devices compare when accessed across the rack?

Across the array?

Search WWH ::

Custom Search

Home