Hardware Reference
In-Depth Information
FIGURE 32.4: Using on-memory deduplication provides extra performance
without any administration cost (i.e., installing a new file system).
becoming more important. Deduplication is a technique that has been mainly
used to reduce the size of data in various cases, such as in file systems
(ZFS [11]), virtual machines (KVM, XEN [5], VMware [12]), and special
tagged memory zones (KSM [2]). However, using deduplication to improve
I/O caching by effectively increasing the size of the I/O cache incurs signifi-
cant CPU overhead due to the cost of deduplication techniques. On the other
hand, with the increasing number of cores, it makes sense to examine related
trade-offs when trading CPU cycles to improve the cache hit ratio and to
achieve better I/O performance.
Deduplication is a powerful tool such that along with NUMA-aware al-
gorithms and adaptive CPU partitioning, latency and overheads can be sig-
nificantly reduced. The results indicated in Figure 32.4 confirm that using
deduplication in the case of a VM's le server improves I/O performance by
30% as Jin et al. [6] show, offering better quality of service while using the
same hardware.
An important aspect of trading CPU eciency for I/O eciency is to en-
sure that the CPU cycles being used are not stolen from running applications.
Ideally, only idle resources should be used to perform I/O-related tasks, such
as deduplication. To examine this issue, a framework [10] was designed that is
able to dynamically assign CPUs to I/O or application tasks. The framework
sends tasks to idle cores (including GPUs) and partitions them in two cate-
gories: cores that run specialized tasks and cores that run application tasks.
The framework is able to dynamically decide the number of cores that should
be used for executing I/O-related tasks. As there is no monotonic relation
between performance obtained and cores used, all possible partitions should
be examined to nd the best one, for example, using Berry et al.'s Armed
Bandit [4] technique. Figure 32.5 shows that any static CPU partitioning can
degrade performance significantly, while using a dynamic algorithm allows
executing I/O tasks and keeping application performance above 90%.
Overall, the preliminary investigation shows that performing I/O-related
tasks in parallel without hurting application performance, can lead to better
I/O performance.
 
Search WWH ::




Custom Search