Information Technology Reference
In-Depth Information
• Postmortems are published for all to see, with a draft report available within x
hours and a final report completed within y days.
• There is periodic review of alerts by the affected team. There is periodic review of
alerts by a cross-functional team.
• Process change requests require data to measure the problem being fixed.
• Dashboards report data in business terms (i.e., not just technical terms).
• Every “failover procedure” has a “date of last use” dashboard.
• Capacity needs are predicted ahead of need.
Level 5: Optimizing
• After process changes are made, before/after data are compared to determine suc-
cess.
• Process changes are reverted if before/after data shows no improvement.
• Process changes that have been acted on come from a variety of sources.
• At least one process change has come from every step (in recent history).
• Cycle time enjoys month-over-month improvements.
• Decisions are supported by modeling “what if” scenarios using extracted actual
data.
A.2 Emergency Response (ER)
Emergency Response covers how outages and disasters are handled. This includes engin-
eering resilient systems that prevent outages plus technical and non-technical processes
performed during and after outages (response and remediation). These topics are covered
in Chapters 6 , 14 , and 15 .
Sample Assessment Questions
• How are outages detected? (automatic monitoring? user complaints?)
• Is there a playbook for common failover scenarios and outage-related duties?
• Is there an oncall calendar?
• How is the oncall calendar created?
• Can the system withstand failures on the local level (component failure)?
• Can the system withstand failures on the geographic level (alternative datacen-
ters)?
Search WWH ::




Custom Search