Information Technology Reference
In-Depth Information
ages. Developers have full access to all monitoring output, and Ops staff have full
access to all build/deploy output. That way everyone is fully empowered to re-
search any issues that come up while oncall or during a failure analysis.
Postmortem Process: In addition to a regular stand-up meeting to review out-ages
and trends, there should be a thorough postmortem or failure analysis done for
every outage. Recurring patterns of minor failures can point to a larger gap in pro-
cess. Findings of a postmortem—specifically, tasks needed to correct is-
sues—should be added to the current development backlog and prioritized accord-
ingly.
Game Day Exercises: Sometimes known as “fire drills,” these are deliberate at-
tempts to test failover and redundancy by triggering service disruption in a planned
fashion. Teams of people are standing by to ensure that the “right thing” happens,
and to fix things manually if it does not. Only by inducing failure can you actually
test what will happen when service components fail. A simple example of a game-
day exercise is rebooting randomly selected machines periodically to make sure all
failover systems function properly.
Error Budgets: Striving for perfection discourages innovation, but too much in-
novation means taking on too much risk. A system like Google's Error Budgets
brings the two into equilibrium. A certain amount of downtime is permitted each
month (the budget). Until the budget is exhausted, developers may do as many re-
leases as they wish. Once the budget is exhausted, they may do only emergency se-
curity fixes for the rest of the month. To conserve the Error Budgets, they can ded-
icate more time for testing and building frameworks that assure successful re-
leases. This aligns the priorities of operations and developers and helps them work
together better. See Section 19.4 for a full description.
8.4.6 Common Technical DevOps Practices
DevOps is, fundamentally, a structural and organizational paradigm. However, to meet the
goals of DevOps, a number of technical practices have been adopted or developed. Again,
not all of them are used by every DevOps organization. These practices are tools in your
toolbox, and you should choose those that will best serve your situation.
Same Development and Operations Toolchain: Development and operations can
best speak the same language by using the same tools wherever possible. This can
be as simple as using the same bug-tracking system for both development and op-
erations/deployment issues. Another example is having a unified source code man-
agement system that stores not just the product's source code, but also the source
code of operational tools and system configurations.
Search WWH ::




Custom Search