Building a Graph Database Application - Graph Databases

Databases Reference

In-Depth Information

Redundancy

We can optimize for redundancy and availability by ensuring the database cluster

is big enough to survive a certain number of machine failures (i.e., to survive two

machines failing, we will need a cluster comprising five instances).

Load

With a replicated graph database solution, we can optimize for load by scaling

horizontally (for read load), and vertically (for write load).

Performance

Redundancy and load can be costed in terms of the number of machines necessary to

ensure availability (five machines to provide continued availability in the face of two

machines failing, for example) and scalability (one machine per some number of con‐

current requests, as per the calculations in “Load” on page 97 ). But what about perfor‐

mance? How can we cost performance?

Calculating the cost of graph database performance

In order to understand the cost implications of optimizing for performance, we need

to understand the performance characteristics of the database stack. As we describe in

more detail later in “Native Graph Storage” on page 144 , a graph database uses disk for

durable storage, and main memory for caching portions of the graph. In Neo4j, the

caching parts of main memory are further divided between the filesystem cache (which

is typically managed by the operating system) and the object cache. The filesystem cache

is a portion of off-heap RAM into which files on disk are read and cached before being

served to the application. The object cache is an on-heap cache that stores object in‐

stances of nodes, relationships, and properties.

Spinning disk is by far the slowest part of the database stack. Queries that have to reach

all the way down to spinning disk will be orders of magnitude slower than queries that

touch only the object cache. Disk access can be improved by using solid-state drives

(SSDs) in place of spinning disks, providing an approximate 20 times increase in per‐

formance, or by using enterprise flash hardware, which can reduce latencies even

further.

Spinning disks and SDDs are cheap, but not very fast. Far more significant performance

benefits accrue when the database has to deal only with the caching layers. The filesystem

cache offers up to 500 times the performance of spinning disk, whereas the object cache

can be up to 5,000 times faster.

For comparative purposes, graph database performance can be expressed as a function

of the percentage of data available at each level of the object cache-filesystem cache-disk

hierarchy:

(% graph in object cache x 5000) * (% graph in filesystem cache * 500) * 20 (if using SSDs)

Search WWH ::

Custom Search

Home