Databases Reference
In-Depth Information
availability is maintained. But as we'll now discuss, scale itself is more nuanced than
simply the number of servers we deploy.
Scale
The topic of scale has become more important as data volumes have grown. In fact, the
problems of data at scale, which have proven difficult to solve with relational databases,
have been a substantial motivation for the NOSQL movement. In some sense, graph
databases are no different; after all, they also need to scale to meet the workload demands
of modern applications. But scale isn't a simple value like transactions per second: it's
an aggregate value that we measure across multiple axes.
For graph databases, we will decompose our broad discussion on scale into three key
themes:
1. Capacity (graph size)
2. Latency (response time)
3. Read and write throughput
Capacity
Some graph database vendors have chosen to eschew any upper bounds in graph size
in exchange for performance and storage cost. Neo4j has taken a somewhat unique
approach historically, having maintained a “sweet spot” that achieves faster performance
and lower storage (and consequently diminished memory footprint and IO-ops) by
optimizing for graph sizes that lie at or below the 95th percentile of use cases. The reason
for the trade-off lies in the use of fixed record sizes and pointers, which (as discussed
in “Native Graph Storage” on page 144 ) it uses extensively inside of the store. At the
time of writing, the 1.9 release of Neo4j can support single graphs having tens of billions
of nodes, relationships, and properties. This allows for graphs with a social networking
dataset roughly the size of Facebook's.
The Neo4j team has publicly expressed the intention to support 100B+
nodes/relationships/properties in a single graph as part of its 2013
roadmap.
How large must a dataset be to take advantage of all of the benefits a graph database has
to offer? The answer is, smaller than you might think. For queries of second or third
degree, the performance benefits show with datasets having a few single-digit thousand
nodes. The higher the degree of the query, the more extreme the delta. The ease-of-
development benefits are of course unrelated to data volume, and available regardless
Search WWH ::




Custom Search