Databases Reference
In-Depth Information
DEFINING THE PROBLEM
Investing time to understand the problem and application environment often leads to a higher-
quality and faster problem resolution. While it is tempting to focus on immediately resolving the
problem, complex problems are rarely resolved until causes are fully understood. A thorough under-
standing of the coni guration, patterns, and characteristics of the problem will position you well for
resolving the problem.
To learn about the problem, you need to identify the major software and hardware components,
review the impact of recent changes, and understand the specii c circumstances that cause the prob-
lem condition to occur. The following section provides a framework for these aspects. Decomposing
the problem into constituent components will help isolate the cause of the problem and identify
bottlenecks.
Guidelines for Identifying the Problem
Use the following guidelines to fully comprehend the exact problem you are facing:
Construct a diagram of the end-to-end application environment.
Obtain visibility of major hardware components, paying special attention to components
that may complicate troubleshooting, such as geographically dispersed coni gurations, local
caching, and network load balancing (NLB). Network load balancers can mask a problem
with an individual server because the problem server may only serve trafi c for 25% of
requests (assuming four active servers); therefore, occurrences of the problem can appear
random or inconsistent.
Gather all relevant logs to a single location:
Windows and System Event logs
SQL Server Error Logs
Dump i les
Application logs
Construct a timeline of activities and events leading up to the failure.
Retrieve change logs, including any information relating to changes before the problem
occurred and any changes or steps carried out in an attempt to resolve the problem.
Understand the steps necessary to reproduce the problem. If possible, ensure that you have a
repeatable process to reproduce the problem and validate on a test environment if possible.
Agree on success criteria. Where the problem is repeatable, this is easy. With intermittent
problems this can be more difi cult, although agreeing to a period of non-occurrence may be
valid (e.g., before troubleshooting the problem occurred daily, so if one week passes without
the problem you can consider the issue resolved).
Understand log context, (e.g., client, middle tier, or SQL Server). Pay attention to the time
zone on each machine. It may be necessary to synchronize the time zones for data from
multiple sources.
 
Search WWH ::




Custom Search