Hardware Reference
In-Depth Information
varied customers inside an organization. WSC programmers customize third-party software
or build their own, and WSCs have much more homogeneous hardware; the WSC goal is to
make the hardware/software in the warehouse act like a single computer that typically runs
a variety of applications. Often the largest cost in a conventional datacenter is the people to
maintain it, whereas, as we shall see in Section 6.4 , in a well-designed WSC the server hard-
ware is the greatest cost, and people costs shift from the topmost to nearly irrelevant. Conven-
tional datacenters also don't have the scale of a WSC, so they don't get the economic beneits
of scale mentioned above. Hence, while you might consider a WSC as an extreme datacen-
ter, in that computers are housed separately in a space with special electrical and cooling in-
frastructure, typical datacenters share litle with the challenges and opportunities of a WSC,
either architecturally or operationally.
Since few architects understand the software that runs in a WSC, we start with the workload
and programming model of a WSC.
6.2 Programming Models and Workloads for
Warehouse-Scale Computers
If a problem has no solution, it may not be a problem, but a fact—not to be solved, but to be coped
with over time.
Shimon Peres
In addition to the public-facing Internet services such as search, video sharing, and social
networking that make them famous, WSCs also run batch applications, such as converting
videos into new formats or creating search indexes from Web crawls.
Today, the most popular framework for batch processing in a WSC is Map-Reduce [ Dean
and Ghemawat 2008 ] and its open-source twin Hadoop. Figure 6.2 shows the increasing pop-
ularity of MapReduce at Google over time. (Facebook runs Hadoop on 2000 batch-processing
servers of the 60,000 servers it is estimated to have in 2011.) Inspired by the Lisp functions of
the same name, Map first applies a programmer-supplied function to each logical input re-
cord. Map runs on thousands of computers to produce an intermediate result of key-value
pairs. Reduce collects the output of those distributed tasks and collapses them using anoth-
er programmer-defined function. With appropriate software support, both are highly parallel
yet easy to understand and to use. Within 30 minutes, a novice programmer can run a MapRe-
duce task on thousands of computers.
 
Search WWH ::




Custom Search