Graph Database Internals - Graph Databases

Databases Reference

In-Depth Information

of the database size. The authors have seen meaningful production applications range

from as small as a few tens of thousands of nodes, and a few hundred thousand rela‐

tionships, to billions of nodes and relationships.

Latency

Graph databases don't suffer the same latency problems as traditional relational data‐

bases, where the more data we have in tables—and in indexes—the longer the join

operations (this simple fact of life is one of the key reasons that performance tuning is

nearly always the very top issue on a relational DBA's mind). With a graph database,

most queries follow a pattern whereby an index is used simply to find a starting node

(or nodes); the remainder of the traversal then uses a combination of pointer chasing

and pattern matching to search the data store. What this means is that, unlike relational

databases, performance does not depend on the total size of the dataset, but only on the

data being queried. This leads to performance times that are nearly constant (i.e., are

related to the size of the result set), even as the size of the dataset grows (though as we

discussed in Chapter 3 , it's still sensible to tune the structure of the graph to suit the

queries, even if we're dealing with lower data volumes).

Throughput

We might think a graph database would need to scale in the same way as other databases.

But this isn't the case. When we look at IO-intensive application behaviors, we see that

a single complex business operation typically reads and writes a set of related data. In

other words, the application performs multiple operations on a logical subgraph within

the overall dataset. With a graph database such multiple operations can be rolled up

into larger, more cohesive operations. Further, with a graph-native store, executing each

operation takes less computational effort than the equivalent relational operation.

Graphs scale by doing less work for the same outcome.

For example, imagine a publishing scenario in which we'd like to read the latest piece

from an author. In a RDBMS we typically select the author's works by joining the authors

table to a table of publications based on matching author ID, and then ordering the

publications by publication date and limiting to the newest handful. Depending on the

characteristics of the ordering operation, that might be a O(log(n)) operation, which

isn't so very bad.

However, as shown in Figure 6-8 , the equivalent graph operation is O(1) , meaning con‐

stant performance irrespective of dataset size. With a graph we simply follow the out‐

bound relationship called WROTE from the author to the work at the head of a list (or

tree) of published articles. Should we wish to find older publications, we simply follow

the PREV relationships and iterate through a linked list (or, alternatively, recurse through

a tree). Writes follow suit because we always insert new publications at the head of the

list (or root of a tree), which is another constant time operation. This compares favorably

Search WWH ::

Custom Search

Home