Graph Database Internals - Graph Databases

Databases Reference

In-Depth Information

availability is maintained. But as we'll now discuss, scale itself is more nuanced than

simply the number of servers we deploy.

Scale

The topic of scale has become more important as data volumes have grown. In fact, the

problems of data at scale, which have proven difficult to solve with relational databases,

have been a substantial motivation for the NOSQL movement. In some sense, graph

databases are no different; after all, they also need to scale to meet the workload demands

of modern applications. But scale isn't a simple value like transactions per second: it's

an aggregate value that we measure across multiple axes.

For graph databases, we will decompose our broad discussion on scale into three key

themes:

1. Capacity (graph size)

2. Latency (response time)

3. Read and write throughput

Capacity

Some graph database vendors have chosen to eschew any upper bounds in graph size

in exchange for performance and storage cost. Neo4j has taken a somewhat unique

approach historically, having maintained a “sweet spot” that achieves faster performance

and lower storage (and consequently diminished memory footprint and IO-ops) by

optimizing for graph sizes that lie at or below the 95th percentile of use cases. The reason

for the trade-off lies in the use of fixed record sizes and pointers, which (as discussed

in “Native Graph Storage” on page 144 ) it uses extensively inside of the store. At the

time of writing, the 1.9 release of Neo4j can support single graphs having tens of billions

of nodes, relationships, and properties. This allows for graphs with a social networking

dataset roughly the size of Facebook's.

The Neo4j team has publicly expressed the intention to support 100B+

nodes/relationships/properties in a single graph as part of its 2013

roadmap.

How large must a dataset be to take advantage of all of the benefits a graph database has

to offer? The answer is, smaller than you might think. For queries of second or third

degree, the performance benefits show with datasets having a few single-digit thousand

nodes. The higher the degree of the query, the more extreme the delta. The ease-of-

development benefits are of course unrelated to data volume, and available regardless

Search WWH ::

Custom Search

Home