Information Technology Reference
In-Depth Information
Next the server is upgraded, the upgrade is verified, and the draining process is undone.
Then the upgrade process begins again with the next server.
Avoiding Code Pushes When Sleepy
The best time to do a code push is during the day. You are wide awake and more
co-workers are available if something goes wrong.
Many organizations do code pushes very late at night. The typical excuse for a
3 AM upgradeisthattheupgradeisriskyanddoingitlateatnightdecreasesexpos-
ure.
Doingcritical upgradeswhilehalf-asleep isamuchbiggerrisk.Ideally,bynow
we've convinced you that a much better strategy for reducing risk is automated
testing and small batches.
Alternatively, you can have a team eight time zones east of your primary loc-
ation that does code pushes. Those deployments will occur in the middle of the
night for your customers but not for your team.
11.3 Canary
The canary process is a special form of the rolling upgrade that is more appropriate when
large numbers of elements need to be upgraded. If there are hundreds or thousands of serv-
ers or machines, the rolling upgrade process can take a long time. If each server takes 10
minutes, upgrading 1000 servers will take about a week. That would be unacceptable—yet
upgrading all the servers at once is too risky.
Thecanaryprocessinvolvesupgradingaverysmallnumberofreplicas,waitingtoseeif
obviousproblemsdevelop,andthenmovingontoprogressivelylargergroupsofmachines.
In the old days of coal mining, miners would bring caged canaries into the mines. These
birdsarefarmoresensitivethanhumanstoharmfulgases.Ifyourcanarystartedactingsick
or fell from its perch, it was time to get out of the mine before you became incapacitated
by the gases.
Likewise, the canary technique upgrades a single machine and then tests it for a while.
Problemstendtoappearinthefirst5or10minutes.Ifthecanarylives,agroupofmachines
are upgraded. There is another wait and more testing, and then a larger group is upgraded.
A common canary process is to upgrade one server, then one server per minute until 1
percent of all servers are upgraded, and then one server per second until all are upgraded.
Between each group there may be an extended pause. While this is happening, verification
Search WWH ::




Custom Search