Viewing Server Performance with PerfMon and the PAL Tool - SQL Server 2012 Internals and Troubleshooting

Databases Reference

In-Depth Information

The overhead of using PerfMon to monitor normal servers with regular workload is typically mini-

mal. Performance becomes a discussion point when monitoring servers operating in time-sensitive

environments (e.g., trading or reservation platforms) or with servers suffering acute performance

problems — those in which the monitoring overhead could tip the server over the edge.

Because reading PerfMon counters is the only real overhead of concern, you should consider net-

work time and disk activity during monitoring. If you can perceive performance degradation

when running PerfMon, you can quickly and easily stop logging and measure any performance

improvement.

NOTE One of the challenges with many performance problems is that you must

obtain a PerfMon log to identify the cause of the problem. Without a log, engi-

neers and managers can observe poor application performance and hypothesize

about potential causes and remedies, but performance data is needed in order to

diagnose the problem and take remedial action.

Frequently, you just have to accept the risk and overhead of running PerfMon

because there simply is no better way to obtain performance data that will help

solve a problem.

The Impact of Running PerfMon

PerfMon is a lightweight tool and its impact on any given server is partly related to how PerfMon

is coni gured, but it is also dependent on the workload of that server while PerfMon is running. To

illustrate this scenario, consider two servers: Server A is suffering under heavy workload with 99%

CPU utilization and poor disk performance, while server B currently runs with 20% CPU and good

disk response times. In this case, it's likely that the impact to server A is greater because PerfMon

could consume 1% or 2% available CPU capacity, whereas that same amount added by PerfMon to

server B will have negligible detectable impact.

Many organizations attempt to reduce the risk and impact to systems by monitoring during periods

of low activity — e.g., during lunch or late afternoon — when user volumes and activity are typi-

cally lower, but this is usually the worst idea! It is essential to capture data while the problem is hap-

pening, not on either side of the problem (typically when concurrency is at its peak). Additionally,

the worse the problem, the easier it is to spot. Often problems are accentuated with user activity, so

if they're more likely to occur and be worse when they do happen, you've got the best chance pos-

sible to capture a log containing them.

There are three key factors to consider when determining the impact of PerfMon: sample interval,

number of counters, and disk performance. The following sections take a brief look at each.

Sample Interval

The sample interval controls the frequency with which PerfMon polls counters to read their values. The

more often PerfMon samples, the greater the impact to the server and the more log data generated.

The default is 15 seconds, which is usually i ne when tracing for a few hours only; when tracing over

longer periods, reducing the sample interval reduces both the overhead of PerfMon and the size of the

i le generated.

Search WWH ::

Custom Search

Home