Databases Reference
In-Depth Information
Redundancy
We can optimize for redundancy and availability by ensuring the database cluster
is big enough to survive a certain number of machine failures (i.e., to survive two
machines failing, we will need a cluster comprising five instances).
Load
With a replicated graph database solution, we can optimize for load by scaling
horizontally (for read load), and vertically (for write load).
Performance
Redundancy and load can be costed in terms of the number of machines necessary to
ensure availability (five machines to provide continued availability in the face of two
machines failing, for example) and scalability (one machine per some number of con‐
current requests, as per the calculations in “Load” on page 97 ). But what about perfor‐
mance? How can we cost performance?
Calculating the cost of graph database performance
In order to understand the cost implications of optimizing for performance, we need
to understand the performance characteristics of the database stack. As we describe in
more detail later in “Native Graph Storage” on page 144 , a graph database uses disk for
durable storage, and main memory for caching portions of the graph. In Neo4j, the
caching parts of main memory are further divided between the filesystem cache (which
is typically managed by the operating system) and the object cache. The filesystem cache
is a portion of off-heap RAM into which files on disk are read and cached before being
served to the application. The object cache is an on-heap cache that stores object in‐
stances of nodes, relationships, and properties.
Spinning disk is by far the slowest part of the database stack. Queries that have to reach
all the way down to spinning disk will be orders of magnitude slower than queries that
touch only the object cache. Disk access can be improved by using solid-state drives
(SSDs) in place of spinning disks, providing an approximate 20 times increase in per‐
formance, or by using enterprise flash hardware, which can reduce latencies even
further.
Spinning disks and SDDs are cheap, but not very fast. Far more significant performance
benefits accrue when the database has to deal only with the caching layers. The filesystem
cache offers up to 500 times the performance of spinning disk, whereas the object cache
can be up to 5,000 times faster.
For comparative purposes, graph database performance can be expressed as a function
of the percentage of data available at each level of the object cache-filesystem cache-disk
hierarchy:
(% graph in object cache x 5000) * (% graph in filesystem cache * 500) * 20 (if using SSDs)
Search WWH ::




Custom Search