Database Reference
In-Depth Information
Regarding the storage considerations we have just discussed, the workload characteristics are probably the most
important. Workload in this context is defined as the total amount of work that is performed at the storage subsystem.
To better understand the I/O workloads on the system, the following should also be understood with respect to the
application:
Read:write ratio
Sequential vs. random access
Data access patterns, arrival patterns, and size of data being requested and/or written with
each request
Inter-arrival rates and concurrency (patterns of request arrival rates)
A workload characterized by the number of transactions is called a transaction - based workload and is
measured in I/O per second (IOPS). A workload characterized by large I/O is called throughput - based workload and
is measured in megabytes per second (MBps). Both these workloads are conflicting in nature and consequently they
work best with a wide range of configuration settings across all pieces of the storage subsystem.
Transaction-Based Workload
High-performance-based transaction environments cannot be created on low-cost storage subsystems. These
low-cost storage systems have a very small cache and cause flushing in and flushing out of the data from cache, thus
causing the frequent retrieval of data from disks. On the other hand, modern storage arrays have larger caches—for
example a 6780 array has 64GB cache and is perhaps more effective for an OLTP-based workload, where I/O requests
are randomly distributed over a large database.
Environments that require many distributed random I/O requests depend heavily on the number of back-end
drives that are available for parallel processing of the host's workload. When data is confined to few disk drives, it can
result in high queuing, which would cause long response times.
In such implementations, the IOPS acceptable is an important factor in deciding how large a storage subsystem is
required. While willingness to have higher response times can help in reducing the cost, this may not be acceptable to
the user community.
Because these are slow random I/O operations and because workload content can be continually changing
throughout the course of the day, these bottlenecks can be very mysterious in nature and can appear and disappear or
move from one location to another over time.
Throughput-Based Workload
This kind of workload is typically seen with applications that require massive amounts of data to be sent or received
and frequently use large sequential blocks to reduce disk latency. Throughput rates for these types of operations are
dependent on the internal bandwidth of the storage subsystem. In such systems fewer drives are needed to reach
maximum throughput rates. In this environment, read operations make use of cache to stage greater chunks of data at
a time in order to improve the overall performance.
Typically a data warehouse or data mart falls into this category. Users query large amounts of data which are
normally bulk reads; however, read caches may not always help as it is rare that several users look at the same data
sets simultaneously, or, for that matter, that the same user looks at the same set of data again.
Mixed Workload
Most business environments tend to define the storage system so it is able to meet both types of workloads, or
mixed workloads. Mixed workloads introduce additional challenges around the sizing and configuration of the
storage systems. While the combination of workloads may be difficult due the uniqueness of the data set and
 
Search WWH ::




Custom Search