Databases Reference
In-Depth Information
of the database size. The authors have seen meaningful production applications range
from as small as a few tens of thousands of nodes, and a few hundred thousand rela‐
tionships, to billions of nodes and relationships.
Latency
Graph databases don't suffer the same latency problems as traditional relational data‐
bases, where the more data we have in tables—and in indexes—the longer the join
operations (this simple fact of life is one of the key reasons that performance tuning is
nearly always the very top issue on a relational DBA's mind). With a graph database,
most queries follow a pattern whereby an index is used simply to find a starting node
(or nodes); the remainder of the traversal then uses a combination of pointer chasing
and pattern matching to search the data store. What this means is that, unlike relational
databases, performance does not depend on the total size of the dataset, but only on the
data being queried. This leads to performance times that are nearly constant (i.e., are
related to the size of the result set), even as the size of the dataset grows (though as we
discussed in Chapter 3 , it's still sensible to tune the structure of the graph to suit the
queries, even if we're dealing with lower data volumes).
Throughput
We might think a graph database would need to scale in the same way as other databases.
But this isn't the case. When we look at IO-intensive application behaviors, we see that
a single complex business operation typically reads and writes a set of related data. In
other words, the application performs multiple operations on a logical subgraph within
the overall dataset. With a graph database such multiple operations can be rolled up
into larger, more cohesive operations. Further, with a graph-native store, executing each
operation takes less computational effort than the equivalent relational operation.
Graphs scale by doing less work for the same outcome.
For example, imagine a publishing scenario in which we'd like to read the latest piece
from an author. In a RDBMS we typically select the author's works by joining the authors
table to a table of publications based on matching author ID, and then ordering the
publications by publication date and limiting to the newest handful. Depending on the
characteristics of the ordering operation, that might be a O(log(n)) operation, which
isn't so very bad.
However, as shown in Figure 6-8 , the equivalent graph operation is O(1) , meaning con‐
stant performance irrespective of dataset size. With a graph we simply follow the out‐
bound relationship called WROTE from the author to the work at the head of a list (or
tree) of published articles. Should we wish to find older publications, we simply follow
the PREV relationships and iterate through a linked list (or, alternatively, recurse through
a tree). Writes follow suit because we always insert new publications at the head of the
list (or root of a tree), which is another constant time operation. This compares favorably
Search WWH ::




Custom Search