Information Technology Reference
In-Depth Information
For example, Facebook has clusters dedicated to providing service to its own employ-
ees. These clusters receive upgrades first because their employees are willing testers of
new releases—it's part of their job. Next, a small set of outside-user clusters are upgraded.
Lastly, the remaining clusters are upgraded.
Stack Exchange's upgrade process involves many phases. Stack Exchange has more
than 110web communities, plus each community has a meta-community associated with it
for discussing the community itself. The same software is used for all of these communit-
ies, though the colors and designs are different. The deployment phases are the test envir-
onment, then the meta-communities, then the less populated communities, and lastly the
largest and most active community. Each phase starts automatically if the previous phase
saw no problems for a certain amount of time. By the time the upgrade reaches the last
phase, Stack Exchange has high confidence in the release. The earliest phases can toler-
ate more outages for many reasons, including the fact that they are not revenue-generating
units.
11.5 Proportional Shedding
Proportional shedding isadeploymenttechniquewherebythenewserviceisbuiltonnew
machines in parallel to the old service. Then the load balancer sends, or sheds, a small per-
centage of traffic to the new service. If this succeeds, a larger percentage is sent. This pro-
cess continues until all traffic is going to the new service.
Proportionalsheddingcanbeusedtomovetrafficbetweentwosystems.Theoldcluster
isnotturneddownuntiltheentireprocessiscomplete.Ifproblemsarediscovered,theload
can be transferred back to the old cluster.
The problem with this technique is that twice as much capacity is required during the
transition. Ifthe service fits onasingle machine, having two machines runningforthe dur-
ation of the upgrade is reasonable.
Ifthereare1000machines,proportionalsheddingcanbeveryexpensive.Keeping1000
spare machines around may be beyond your budget. In this case, once a certain percentage
of traffic is diverted to the new cluster, some older machines can be recycled and rede-
ployed as part of the new cluster.
Search WWH ::




Custom Search