Hardware Reference
In-Depth Information
decade are $72M for the facility and 3.3 × $67M, or $221M, for servers. Thus, the capital costs
for servers in a WSC over a decade are a factor of three higher than for the WSC facility.
Pitfall Trying To Save Power With Inactive Low Power Modes Versus Active Low
Power Modes
Figure 6.3 on page 440 shows that the average utilization of servers is between 10% and 50%.
Given the concern on operational costs of a WSC from Section 6.4 , you would think low power
modes would be a huge help.
As Chapter 1 mentions, you cannot access DRAMs or disks in these inactive low power modes ,
so you must return to fully active mode to read or write, no mater how low the rate. The
pitfall is that the time and energy required to return to fully active mode make inactive low
power modes less atractive. Figure 6.3 shows that almost all servers average at least 10% util-
ization, so you might expect long periods of low activity but not long periods of inactivity.
In contrast, processors still run in lower power modes at a small multiple of the regular rate,
so active low power modes are much easier to use. Note that the time to move to fully active
mode for processors is also measured in microseconds, so active low power modes also ad-
dress the latency concerns about low power modes.
Pitfall Using Too Wimpy A Processor When Trying To Improve WSC
Cost-performance
Amdahl's law still applies to WSC, as there will be some serial work for each request, and that
can increase request latency if it runs on a slow server [ Hölzle 2010 ] [ Lim et al. 2008 ]. If the
serial work increases latency, then the cost of using a wimpy processor must include the soft-
ware development costs to optimize the code to return it to the lower latency. The larger num-
ber of threads of many slow servers can also be more difficult to schedule and load balance,
and thus the variability in thread performance can lead to longer latencies. A 1 in 1000 chance
of bad scheduling is probably not an issue with 10 tasks, but it is with 1000 tasks when you
have to wait for the longest task. Many smaller servers can also lead to lower utilization, as
it's clearly easier to schedule when there are fewer things to schedule. Finally, even some par-
allel algorithms get less efficient when the problem is partitioned too finely. The Google rule
of thumb is currently to use the low-end range of server class computers [ Barroso and Hölzle
2009 ] .
As a concrete example, Reddi et al. [2010] compared embedded microprocessors (Atom) and
server microprocessors (Nehalem Xeon) running the Bing search engine. They found that the
latency of a query was about three times longer on Atom than on Xeon. Moreover, the Xeon
was more robust. As load increases on Xeon, quality of service degrades gradually and mod-
estly. Atom quickly violates its quality-of-service target as it tries to absorb additional load.
This behavior translates directly into search quality. Given the importance of latency to the
user, as Figure 6.12 suggests, the Bing search engine uses multiple strategies to refine search
results if the query latency has not yet exceeded a cutoff latency. The lower latency of the lar-
ger Xeon nodes means they can spend more time refining search results. Hence, even when
the Atom had almost no load, it gave worse answers in 1% of the queries than Xeon. At normal
loads, 2% of the answers were worse.
Search WWH ::




Custom Search