Building a Graph Database Application - Graph Databases

Databases Reference

In-Depth Information

An application in which 100% of the graph is available in the object cache (as well as in

the filesystem cache, and on disk) will be more performant than one in which 100% is

available on disk, but only 80% in the filesystem cache and 20% in the object cache.

Performance optimization options

There are, then, three areas in which we can optimize for performance:

• Increase the object cache (from 2 GB, all the way up to 200 GB or more in exceptional

circumstances)

• Increase the percentage of the store mapped into the filesystem cache

• Invest in faster disks: SSDs or enterprise flash hardware

The first two options here require adding more RAM. In costing the allocation of RAM,

however, there are a couple of things to bear in mind. First, whereas the size of the store

files in the filesystem cache map one-to-one with the size on disk, graph objects in the

object cache can be up to 10 times bigger than their on-disk representations. Allocating

RAM to the object cache is, therefore, far more expensive per graph element than allo‐

cating it to the filesystem cache. The second point to bear in mind relates to the location

of the object cache. If our graph database uses an on-heap cache, as does Neo4j, then

increasing the size of the cache requires allocating more heap. Most modern JVMs do

not cope well with heaps much larger than 8 GB. Once we start growing the heap beyond

this size, garbage collection can impact the performance of our application. 8

As Figure 4-11 shows, the sweet spot for any cost versus performance trade-off lies

around the point where we can map our store files in their entirety into RAM, while

allowing for a healthy, but modestly sized object cache. Heaps of between 4 and 8 GB

are not uncommon, though in many cases, a smaller heap can actually improve per‐

formance (by mitigating expensive GC behaviors).

Calculating how much RAM to allocate to the heap and the filesystem cache depends

on our knowing the projected size of our graph. Building a representative dataset early

in our application's development life cycle will furnish us with some of the data we need

to make our calculations. If we cannot fit the entire graph into main memory (that is,

at a minimum, into the filesystem cache), we should consider cache sharding (see “Cache

sharding” on page 80 ).

8. Neo4j Enterprise Edition includes a cache implementation that mitigates the problems encountered with

large heaps, and is being successfully used with heaps in the order of 200 GB.

Search WWH ::

Custom Search

Home