Databases Reference
In-Depth Information
versus actually i xing it. Indeed, the i x itself may be trivial, but knowing exactly which i x to make
is completely dependent on accurately understanding the problem and its cause; therefore, accurate
root cause diagnosis is vital.
To get in front of a complex issue — that is, understand it and resolve it — use the following ten steps:
1.
Dei ne the problem — Establish a clear problem statement. The objective is to capture in
one or two sentences a summary of the technical problem and success criteria. A detailed
explanation will likely be required later, but aim initially to create a concise summary for
circulation to interested parties.
2.
Ascertain the problem's impact — The business stakeholders and sponsors often don't want
to know technical details. They want to know the operational and i nancial impact of the
incident. This must be categorized and monetized to the furthest extent possible. For exam-
ple, if you had a website outage, you should estimate the cost to the organization — e.g.,
$10,000/ hour. If degraded service is likely, how much will it cost in lost revenue or reputa-
tion? If the incident prevents employees from completing their work (e.g., call center workers
are unproductive), this can be estimated by the cost of wages plus operational impact
(e.g., $10/ hour for 50 call center employees plus any overtime to make callbacks).
3.
Engage the correct resources — These could be internal or external. In many enterprise sce-
narios, it is necessary to formally engage internal resources from other disciplines, such as
storage operations, application support, and incident management. There may be external
suppliers or third parties who should be engaged, such as hardware manufacturers, software
vendors, or implementation consultants. Ensure that all participants are briefed with the
same problem description and have a good understanding of the success criteria.
4.
Identify potential causes — Meet all necessary parties (physically or virtually) to share the
problem description, its impact, and any troubleshooting steps already performed. Consider
proposed options to mitigate the impact or work around the problem. Identify any possibil-
ity to minimize the immediate impact to the business while a long-term solution is sought.
5.
Plan and coordinate tasks across teams — Develop a plan, consisting of a number of hypoth-
eses and a number of scenarios that may cause or inl uence the problem. Seek to prove or
disprove each hypothesis by assigning it to a team with the skills and experience necessary
to prove the hypothesis and reach a conclusion. — The intention is to narrow the focus by
eliminating components that are not causing the problem, until eventually the problem com-
ponent is found. Iterate around this method until the hypotheses are proven or disproven.
6.
Select a communication plan and review — Document the plan and agree who will keep
management, end users, and the technical team updated. Mutually agree on a time to recon-
vene, (e.g., every 2 hours or 4 hours may be appropriate). In scenarios with geographically
dispersed teams, maintaining an open conference call to assist troubleshooting can be use-
ful, but it's still important to plan and execute regular reviews.
7.
Identify root cause — After a number of iterations (each iteration should be isolated, repeatable,
and have narrow scope),you will have disproved a number of hypotheses, and hopefully proved
one. Once the cause of the problem is understood, progress to the next step to i nd a i x.
8.
Determine solution — This step involves identifying a resolution to the dei ned and under-
stood cause of the problem.
Search WWH ::




Custom Search