Information Technology Reference
In-Depth Information
TABLE 5.1 Application Workload Profile
Characteristic
Value
Description
I/O size
Bytes/kilobytes
It is best if this value matches or is close to
the file system's block size.
Access pattern
Sequential or random
The most common read or write access
pattern used.
File access profile
Data or attribute
Determine if the app performs I/O opera-
tions on many small files.
Bandwidth
Mbps
The bandwidth requirement of the app.
Latency sensitivity
Milliseconds
Is the app sensitive to read or write latency?
I/O Size The I/O size refers to the size of the files that are constantly being processed by the
application into the disk. This plays a very large role in how the file system can be optimized.
Part of the importance of the I/O size is that because of the inherent limitations of I/O devices,
they are less efficient with small I/Os. That is why it is always a best practice to group small
adjacent I/Os into a bigger I/O using a buffer so that there will be only one large operation,
not multiple small ones that go through the same process, causing it to take longer and use
more resources. Figure 5.2 shows a comparison between commonly used file systems as bench-
marked by Vanninen and Wang from Clemson University in their paper “On Benchmarking
Popular File Systems.”
Access Pattern The access pattern of an application has to do with how it seeks data in
the storage media; it can read or write a file either sequentially or in random order. It is
much easier to tune the file system if the application does a lot of sequential I/Os because
these small I/Os can simply be grouped into a single large one. The third access pattern is
called strided access , and it's typically used for scientific applications. However, this type
can be largely considered as a type of sequential access with some characteristics of random
access, such as caching.
File Access Profile Applications can be either data or attribute intensive. Data-intensive
ones shift a lot of data around but create or delete minimally. On the other end are attribute-
intensive applications, which create and delete a lot of files yet read and write only a fraction
of each. A good example of a data-intensive application that deals with large amounts of data
(typically 100 MB or larger) is a big data application like Apache Hadoop , which utilizes
a form of Google's MapReduce architecture and programming model. Attribute-intensive
applications are those that check a lot of metadata and attributes to perform operations.
Major examples are revision control systems such as Git , SVN , and CVS .
Search WWH ::




Custom Search