Databases Reference
In-Depth Information
An application in which 100% of the graph is available in the object cache (as well as in
the filesystem cache, and on disk) will be more performant than one in which 100% is
available on disk, but only 80% in the filesystem cache and 20% in the object cache.
Performance optimization options
There are, then, three areas in which we can optimize for performance:
• Increase the object cache (from 2 GB, all the way up to 200 GB or more in exceptional
circumstances)
• Increase the percentage of the store mapped into the filesystem cache
• Invest in faster disks: SSDs or enterprise flash hardware
The first two options here require adding more RAM. In costing the allocation of RAM,
however, there are a couple of things to bear in mind. First, whereas the size of the store
files in the filesystem cache map one-to-one with the size on disk, graph objects in the
object cache can be up to 10 times bigger than their on-disk representations. Allocating
RAM to the object cache is, therefore, far more expensive per graph element than allo‐
cating it to the filesystem cache. The second point to bear in mind relates to the location
of the object cache. If our graph database uses an on-heap cache, as does Neo4j, then
increasing the size of the cache requires allocating more heap. Most modern JVMs do
not cope well with heaps much larger than 8 GB. Once we start growing the heap beyond
this size, garbage collection can impact the performance of our application. 8
As Figure 4-11 shows, the sweet spot for any cost versus performance trade-off lies
around the point where we can map our store files in their entirety into RAM, while
allowing for a healthy, but modestly sized object cache. Heaps of between 4 and 8 GB
are not uncommon, though in many cases, a smaller heap can actually improve per‐
formance (by mitigating expensive GC behaviors).
Calculating how much RAM to allocate to the heap and the filesystem cache depends
on our knowing the projected size of our graph. Building a representative dataset early
in our application's development life cycle will furnish us with some of the data we need
to make our calculations. If we cannot fit the entire graph into main memory (that is,
at a minimum, into the filesystem cache), we should consider cache sharding (see “Cache
sharding” on page 80 ).
8. Neo4j Enterprise Edition includes a cache implementation that mitigates the problems encountered with
large heaps, and is being successfully used with heaps in the order of 200 GB.
Search WWH ::




Custom Search