Information Technology Reference
In-Depth Information
Upgrading Blog Search
When Tom was an SRE for Google's Blog Search service, the customer-facing
stack was replicated in four datacenters. Each replica was independent of the oth-
ers. There was enough capacity that any one stack could be down and the others
could handle the entire traffic load. One at a time, each stack would be drained by
removingitfromtheGLB,upgradingit,checkingit,andthenaddingitbacktothe
GLB.
Meanwhile, another part of the system was the “pipeline”: a service that
scanned for new blog posts, ingested them, produced the new corpus, and dis-
tributed it to the four customer-facing stacks. The pipeline was very important
to the entire service, but if it was down customers would not notice. However,
the freshness of the search results would deteriorate the longer the pipeline was
down. Therefore uptime was important but not essential and upgrades were done
by bringing down the entire pipeline.
Many services at Google were architected in a similar way and upgrades were
done in a similar pattern.
11.2 Rolling Upgrades
In a rolling upgrade, individual machines or servers are removed from service, upgraded,
andputbackinservice. Thisisrepeated foreach element beingupgraded; theprocessrolls
through all of them until it is complete.
The customer sees continuous service because the individual outages are hidden by a
local load balancer. During the upgrade, some customers will see the new software and
some will see the old software. There is a chance that a particular customer will see new
features appear and disappear as sequential requests go to new and old machines. This is
rare due to load balancer stickiness, discussed in Section 4.2.3 , and other factors, such as
deploying new features toggled off, as described in Section 2.1.9 .
During the upgrade, there is a temporary reduction in capacity. If there are 10 servers,
as each is upgraded the service is at 90 percent capacity. Therefore this technique requires
planning to assure there is sufficient capacity.
The process works as follows. First the server or machine is drained. This can be done
by reconfiguring the load balancer to stop sending requests to it or by having the replica
enter “lame duck mode,” as described in Section 2.1.3 , where it “lies,” telling the load bal-
ancer it is unhealthy so that the load balancer stops sending requests to it. Eventually no
new traffic will have been received for a while and all in-flight requests will be finished.
Search WWH ::




Custom Search