Monitoring Server Resources with System Monitor - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

Discovering Disk Bottlenecks

Whether you have an enterprise SAN or a local SATA array, retrieving data from disk is without a doubt

the slowest step in returning query results to a user. There are many ways you can identify and isolate

disk performance issues, and there are a number of alternatives to resolving disk issues.

The marketing teams at SAN manufacturers do a great job of persuading customers that they have the

fastest, most responsive, lowest latency, cheapest SAN available today. The reality is often that no matter

how fast the disks are, how short the seek time is, or how large the cache is, application workload can

cause SQL Server to request or write data faster than most SANs can read or commit data.

At a high level, keeping an eye on % Disk Time will help you understand the amount of time an indi-

vidual thread spent waiting on disk I/O. You should begin to be concerned about disk performance if

either of these values are 20 milliseconds (ms) or higher. If either of these is high, there are a number of

additional counters you can use to get a better picture of disk activity.

One of the most useful counters in the system monitor is Average Disk Queue length. Traditionally it

really did give an indication of queue depth (by dividing queue length by the number of spindles in a

disk array). However, in times of SANs, which dynamically and intelligently manage the mapping of

LUNS to spindles within a disk group, it has becoming increasingly difficult to determine exactly how

many spindles host a particular LUN, and if these spindles are shared with another LUN. This abstraction

by SAN vendors has in many instances drastically improved performance, reliability, and scalability of

disk subsystems while reducing the cost of ownership.

With this in mind, you can still make use of Average Disk Queue length. It's just more arbitrary now

than it used to be. It's still a great counter to include in your baseline and to reference in the event of a

performance issue because it's usually fairly representative of disk performance. Don't forget to add the

counters for specific drives; don't add the _TOTAL counter, as this can generalize the result and mask

problems which could lead to you making false assumptions about disk performance.

In order to approximate disk array response time, you should look to monitor Average Disk

Seconds/Read and Average Disk Seconds/Write. As the names would suggest, these two counters will

provide an indication of the average time taken to read or write to disk. The ceiling value you should

have in mind is 20ms. This means that if your disk sub-system takes longer than 20ms to respond to

either a read or write request, you can expect degraded SQL Server performance.

The good news is that there's a whole bunch of options to remedy poor disk performance. Once you've

established that you have a disk performance issue, often you'll have multiple remedial options.

Best Practice

There are a few fundamental best practices to mention first, including separation of

SQL Server data and log files. These files have distinctly different I/O patterns. In an

Online Transaction Processing (OLTP) environment, you should expect to see random

reads and writes to the data file, whereas you'll see sequential writes and very low read

activity to the transaction log file. For these reasons you should plan to place these two

files on separate drives. If you're using a SAN with an intelligent controller, this will

allow the controller to manage the physical disk-to-drive mapping better.

Search WWH ::

Custom Search

Home