Databases Reference
In-Depth Information
Discovering Disk Bottlenecks
Whether you have an enterprise SAN or a local SATA array, retrieving data from disk is without a doubt
the slowest step in returning query results to a user. There are many ways you can identify and isolate
disk performance issues, and there are a number of alternatives to resolving disk issues.
The marketing teams at SAN manufacturers do a great job of persuading customers that they have the
fastest, most responsive, lowest latency, cheapest SAN available today. The reality is often that no matter
how fast the disks are, how short the seek time is, or how large the cache is, application workload can
cause SQL Server to request or write data faster than most SANs can read or commit data.
At a high level, keeping an eye on % Disk Time will help you understand the amount of time an indi-
vidual thread spent waiting on disk I/O. You should begin to be concerned about disk performance if
either of these values are 20 milliseconds (ms) or higher. If either of these is high, there are a number of
additional counters you can use to get a better picture of disk activity.
One of the most useful counters in the system monitor is Average Disk Queue length. Traditionally it
really did give an indication of queue depth (by dividing queue length by the number of spindles in a
disk array). However, in times of SANs, which dynamically and intelligently manage the mapping of
LUNS to spindles within a disk group, it has becoming increasingly difficult to determine exactly how
many spindles host a particular LUN, and if these spindles are shared with another LUN. This abstraction
by SAN vendors has in many instances drastically improved performance, reliability, and scalability of
disk subsystems while reducing the cost of ownership.
With this in mind, you can still make use of Average Disk Queue length. It's just more arbitrary now
than it used to be. It's still a great counter to include in your baseline and to reference in the event of a
performance issue because it's usually fairly representative of disk performance. Don't forget to add the
counters for specific drives; don't add the _TOTAL counter, as this can generalize the result and mask
problems which could lead to you making false assumptions about disk performance.
In order to approximate disk array response time, you should look to monitor Average Disk
Seconds/Read and Average Disk Seconds/Write. As the names would suggest, these two counters will
provide an indication of the average time taken to read or write to disk. The ceiling value you should
have in mind is 20ms. This means that if your disk sub-system takes longer than 20ms to respond to
either a read or write request, you can expect degraded SQL Server performance.
The good news is that there's a whole bunch of options to remedy poor disk performance. Once you've
established that you have a disk performance issue, often you'll have multiple remedial options.
Best Practice
There are a few fundamental best practices to mention first, including separation of
SQL Server data and log files. These files have distinctly different I/O patterns. In an
Online Transaction Processing (OLTP) environment, you should expect to see random
reads and writes to the data file, whereas you'll see sequential writes and very low read
activity to the transaction log file. For these reasons you should plan to place these two
files on separate drives. If you're using a SAN with an intelligent controller, this will
allow the controller to manage the physical disk-to-drive mapping better.
Search WWH ::




Custom Search