Troubleshooting Methodology and Practices - SQL Server 2012 Internals and Troubleshooting

Databases Reference

In-Depth Information

DEFINING THE PROBLEM

Investing time to understand the problem and application environment often leads to a higher-

quality and faster problem resolution. While it is tempting to focus on immediately resolving the

problem, complex problems are rarely resolved until causes are fully understood. A thorough under-

standing of the coni guration, patterns, and characteristics of the problem will position you well for

resolving the problem.

To learn about the problem, you need to identify the major software and hardware components,

review the impact of recent changes, and understand the specii c circumstances that cause the prob-

lem condition to occur. The following section provides a framework for these aspects. Decomposing

the problem into constituent components will help isolate the cause of the problem and identify

bottlenecks.

Guidelines for Identifying the Problem

Use the following guidelines to fully comprehend the exact problem you are facing:

➤

Construct a diagram of the end-to-end application environment.

➤

Obtain visibility of major hardware components, paying special attention to components

that may complicate troubleshooting, such as geographically dispersed coni gurations, local

caching, and network load balancing (NLB). Network load balancers can mask a problem

with an individual server because the problem server may only serve trafi c for 25% of

requests (assuming four active servers); therefore, occurrences of the problem can appear

random or inconsistent.

➤

Gather all relevant logs to a single location:

➤

Windows and System Event logs

➤

SQL Server Error Logs

➤

Dump i les

➤

Application logs

➤

Construct a timeline of activities and events leading up to the failure.

➤

Retrieve change logs, including any information relating to changes before the problem

occurred and any changes or steps carried out in an attempt to resolve the problem.

➤

Understand the steps necessary to reproduce the problem. If possible, ensure that you have a

repeatable process to reproduce the problem and validate on a test environment if possible.

➤

Agree on success criteria. Where the problem is repeatable, this is easy. With intermittent

problems this can be more difi cult, although agreeing to a period of non-occurrence may be

valid (e.g., before troubleshooting the problem occurred daily, so if one week passes without

the problem you can consider the issue resolved).

➤

Understand log context, (e.g., client, middle tier, or SQL Server). Pay attention to the time

zone on each machine. It may be necessary to synchronize the time zones for data from

multiple sources.

SQL Server 2012 Internals and Troubleshooting

Search WWH ::

Custom Search

Home