Hardware Reference
In-Depth Information
With this background, we can now examine the Google architecture. Most
companies, when faced with a huge data base, massive transaction rate, and the
need for high reliability, would buy the biggest, fastest, and most reliable equip-
ment on the market. Google did just the opposite. It bought cheap, modest-per-
formance PCs. Lots of them. And with them, it built the world's largest off-the-
shelf cluster. The driving principle behind this decision was simple: optimize
price/performance.
The logic behind this decision lies in economics: commodity PCs are very
cheap. High-end servers are not and large multiprocessors are even less so. Thus
while a high-end server might have two or three times the performance of a
midrange desktop PC, it will typically be 5-10 times the price, which is not cost
effective.
Of course, cheap PCs fail more often than top-of-the-line servers, but the latter
fail, too, so the Google software had to be designed to work with failing hardware
no matter what kind of equipment it was using. Once the fault-tolerance software
was written, it did not really matter whether the failure rate was 0.5% per year or
2% per year, failures had to be dealt with. Google's experience is that about 2% of
the PCs fail each year. More than half the failures are due to faulty disks, followed
by power supplies and then RAM chips. After burn-in, CPUs never fail. Actually,
the biggest source of crashes is not hardware at all; it is software. The first re-
sponse to a crash is just to reboot, which often solves the problem (the electronic e-
quivalent of the doctor saying: ''Take two aspirins and go to bed.'').
A typical modern Google PC consists of a 2-GHz Intel processor, 4 GB of
RAM, and a disk of around 2 TB, the kind of thing a grandmother might buy for
checking her email occasionally. The only specialty item is an Ethernet chip. Not
exactly state of the art, but very cheap. The PCs are housed in 1u-high cases
(about 5 cm thick) and stacked 40 high in 19-inch racks, one stack in front and one
stack in back for a total of 80 PCs per rack. The PCs in a rack are connected by
switched Ethernet, with the switch inside the rack. The racks in a data center are
also connected by switched Ethernet, with two redundant switches per data center
used to survive switch failures.
The layout of a typical Google data center is illustrated in Fig. 8-44. The in-
coming high-bandwidth OC-48 fiber is routed to each of two 128-port Ethernet
switches. Similarly, the backup OC-12 fiber is also routed to each of the two
switches. The incoming fibers use special input cards and do not occupy any of
the 128 Ethernet ports.
Each rack has four Ethernet links coming out of it, two to the left switch and
two to the right switch. In this configuration, the system can survive the failure of
either switch. Since each rack has four connections to the switch (two from the
front 40 PCs and two from the back 40 PCs), it takes four link failures or two link
failures and a switch failure to take a rack offline. With a pair of 128-port switches
and four links from each rack, up to 64 racks can be supported. With 80 PCs per
rack, a data center can have up to 5120 PCs. But, of course, racks do not have to
Search WWH ::




Custom Search