Information Technology Reference
In-Depth Information
Unlucky Read-Only Replicas
At Google Tom experienced a race condition between a database master and its
replicas. The system carefully sent writes to the master and did all reads from the
replicas. However, one component read data soon after it was updated and often
became confused because it saw outdated information from the replica. As the
team had no time to recode the component, it was simply reconfigured so that
boththe write andread connections went tothe master.Even thoughonlyoneread
query out of many had to go to the master, all were sent to the master. Since the
component was used just once or twice a day, this did not create an undue burden
on the master.
As time went on, usage patterns changed and this component was used more
frequently, until eventually it was used all day long. One day there was an outage
because the master became overloaded due to the load from this component. At
that point, the component had to be re-engineered to properly segregate queries.
One more warning against relying on luck rather than official support from the deve-
lopers: luck runs out. Relying on luck today may result in disaster at the next software re-
lease.Eventhoughitmaybealong-standingpolicyfortheoperationsstaff,itwillappearto
the developers as a “surprise request” to support this configuration. The developers could
rightfullyrefusetofixtheproblembecauseithadnotbeenasupportedconfigurationinthe
first place. You can imagine the confusion and the resulting conflict.
2.1.8 Hot Swaps
Service components should be able to be swapped in or out of their service roles without
triggering an overall service outage. Software components may be self-sufficient for com-
pleting a hot swap or may simply be compatible with a load balancer or other redirection
service that controls the process.
Some physical components, such as power supplies or disk drives, can be swapped
while still electrically powered on, or “hot.” Hot-swappable devices can be changed
without affecting the rest of the machine. For example, power supplies can be replaced
without stopping operations. Hot-pluggable devices can be installed or removed while the
machine is running. Administrative tasks may be required before or after this operation is
performed.Forexample,aharddrivemaynotberecognizedunlesstheoperatingsystemis
toldtoscanfornewdisks.Anewnetworkinterfacecardmayberecognized,buttheapplic-
ation server software may not see it without being restarted unless it has been specifically
programmed to periodically scan for new interfaces.
Search WWH ::




Custom Search