Upgrading Live Services - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

Upgrading Blog Search

When Tom was an SRE for Google's Blog Search service, the customer-facing

stack was replicated in four datacenters. Each replica was independent of the oth-

ers. There was enough capacity that any one stack could be down and the others

could handle the entire traffic load. One at a time, each stack would be drained by

removingitfromtheGLB,upgradingit,checkingit,andthenaddingitbacktothe

GLB.

Meanwhile, another part of the system was the “pipeline”: a service that

scanned for new blog posts, ingested them, produced the new corpus, and dis-

tributed it to the four customer-facing stacks. The pipeline was very important

to the entire service, but if it was down customers would not notice. However,

the freshness of the search results would deteriorate the longer the pipeline was

down. Therefore uptime was important but not essential and upgrades were done

by bringing down the entire pipeline.

Many services at Google were architected in a similar way and upgrades were

done in a similar pattern.

11.2 Rolling Upgrades

In a rolling upgrade, individual machines or servers are removed from service, upgraded,

andputbackinservice. Thisisrepeated foreach element beingupgraded; theprocessrolls

through all of them until it is complete.

The customer sees continuous service because the individual outages are hidden by a

local load balancer. During the upgrade, some customers will see the new software and

some will see the old software. There is a chance that a particular customer will see new

features appear and disappear as sequential requests go to new and old machines. This is

rare due to load balancer stickiness, discussed in Section 4.2.3 , and other factors, such as

deploying new features toggled off, as described in Section 2.1.9 .

During the upgrade, there is a temporary reduction in capacity. If there are 10 servers,

as each is upgraded the service is at 90 percent capacity. Therefore this technique requires

planning to assure there is sufficient capacity.

The process works as follows. First the server or machine is drained. This can be done

by reconfiguring the load balancer to stop sending requests to it or by having the replica

enter “lame duck mode,” as described in Section 2.1.3 , where it “lies,” telling the load bal-

ancer it is unhealthy so that the load balancer stops sending requests to it. Eventually no

new traffic will have been received for a while and all in-flight requests will be finished.

Search WWH ::

Custom Search

Home