Databases Reference
In-Depth Information
to the RDBMS alternative, particularly because it naturally maintains constant time
performance for reads.
Figure 6-8. Constant time operations for a publishing system
Of course, there'll always be situations in which a single machine won't have sufficient
I/O throughput to serve all the queries directed to it. When that happens, it's straight‐
forward with Neo4j to build a cluster that scales horizontally for high availability and
high read throughput. For typical workloads, where reads outstrip writes, this solution
architecture can be ideal.
Should we exceed the capacity of a cluster, we can spread a graph across database in‐
stances by building sharding logic into the application. Sharding involves the use of a
synthetic identifier to join records across database instances at the application level.
How well this will perform depends very much on the shape of the graph. Some graphs
lend themselves very well to this. Mozilla, for instance, uses the Neo4j graph database
as part of its next-generation cloud browser, Pancake. Rather than having a single large
graph, it stores a large number of small independent graphs, each tied to an end user.
This makes it very easy to scale with no performance penalty.
Of course not all graphs have such convenient boundaries. If our graph is large enough
that it needs to be broken up, but no natural boundaries exist, the approach we use is
much the same as what we would use with a NOSQL store like MongoDB: we create
synthetic keys, and relate records via the application layer using those keys plus some
application-level resolution algorithm. The main difference from the MongoDB ap‐
proach is that a native graph database will provide you with a performance boost anytime
you are doing traversals within a database instance, whereas those parts of the traversal
that run between instances will run at roughly the same speed as a MongoDB join.
Overall performance should be markedly faster, however.
 
Search WWH ::




Custom Search