Graph Database Internals - Graph Databases

Databases Reference

In-Depth Information

at O(1) cost each. To find who is friends with Alice, we simply follow all of Alice's

incoming FRIEND relationships to their source, again at O(1) cost each.

Given these costings, it's clear that, in theory at least, graph traversals can be very effi‐

cient. But such high-performance traversals only become reality when they are sup‐

ported by an architecture designed for that purpose.

Native Graph Storage

If index-free adjacency is the key to high-performance traversals, queries, and writes,

then one key aspect of the design of a graph database is the way in which graphs are

stored. An efficient, native graph storage format supports extremely rapid traversals for

arbitrary graph algorithms—an important reason for using graphs. For illustrative pur‐

poses we'll use the Neo4j database as an example of how a graph database is architected.

First, let's contextualize our discussion by looking at Neo4j's high-level architecture,

presented in Figure 6-3 . In what follows we'll work bottom-up, from the files on disk,

through the programmatic APIs, and up to the Cypher query language. Along the way

we'll discuss the performance and dependability characteristics of Neo4j, and the design

decisions that make Neo4j a performant, reliable graph database.

Figure 6-3. Neo4j architecture

Neo4j stores graph data in a number of different store files . Each store file contains the

data for a specific part of the graph (e.g., nodes, relationships, properties). The division

of storage responsibilities—particularly the separation of graph structure from property

data—facilitates performant graph traversals, even though it means the user's view of

Search WWH ::

Custom Search

Home