Databases Reference
In-Depth Information
The overhead of using PerfMon to monitor normal servers with regular workload is typically mini-
mal. Performance becomes a discussion point when monitoring servers operating in time-sensitive
environments (e.g., trading or reservation platforms) or with servers suffering acute performance
problems — those in which the monitoring overhead could tip the server over the edge.
Because reading PerfMon counters is the only real overhead of concern, you should consider net-
work time and disk activity during monitoring. If you can perceive performance degradation
when running PerfMon, you can quickly and easily stop logging and measure any performance
improvement.
NOTE One of the challenges with many performance problems is that you must
obtain a PerfMon log to identify the cause of the problem. Without a log, engi-
neers and managers can observe poor application performance and hypothesize
about potential causes and remedies, but performance data is needed in order to
diagnose the problem and take remedial action.
Frequently, you just have to accept the risk and overhead of running PerfMon
because there simply is no better way to obtain performance data that will help
solve a problem.
The Impact of Running PerfMon
PerfMon is a lightweight tool and its impact on any given server is partly related to how PerfMon
is coni gured, but it is also dependent on the workload of that server while PerfMon is running. To
illustrate this scenario, consider two servers: Server A is suffering under heavy workload with 99%
CPU utilization and poor disk performance, while server B currently runs with 20% CPU and good
disk response times. In this case, it's likely that the impact to server A is greater because PerfMon
could consume 1% or 2% available CPU capacity, whereas that same amount added by PerfMon to
server B will have negligible detectable impact.
Many organizations attempt to reduce the risk and impact to systems by monitoring during periods
of low activity — e.g., during lunch or late afternoon — when user volumes and activity are typi-
cally lower, but this is usually the worst idea! It is essential to capture data while the problem is hap-
pening, not on either side of the problem (typically when concurrency is at its peak). Additionally,
the worse the problem, the easier it is to spot. Often problems are accentuated with user activity, so
if they're more likely to occur and be worse when they do happen, you've got the best chance pos-
sible to capture a log containing them.
There are three key factors to consider when determining the impact of PerfMon: sample interval,
number of counters, and disk performance. The following sections take a brief look at each.
Sample Interval
The sample interval controls the frequency with which PerfMon polls counters to read their values. The
more often PerfMon samples, the greater the impact to the server and the more log data generated.
The default is 15 seconds, which is usually i ne when tracing for a few hours only; when tracing over
longer periods, reducing the sample interval reduces both the overhead of PerfMon and the size of the
i le generated.
Search WWH ::




Custom Search