Databases Reference
In-Depth Information
reasonable-sized datasets, where we'd prefer an O(log n) algorithm—which is somewhat
efficient because it discards half the potential workload on each iteration—or better.
Conversely, a graph database provides constant order lookup for the same query. In this
case, we simply find the node in the graph that represents Bob, and then follow any
incoming friend relationships; these relationships lead to nodes that represent people
who consider Bob to be their friend. This is far cheaper than brute-forcing the result
because it considers far fewer members of the network; that is, it considers only those
that are connected to Bob. Of course, if everybody is friends with Bob, we'll still end up
considering the entire dataset.
To avoid having to process the entire dataset, we could denormalize the storage model
by adding backward links. Adding a second property, called perhaps friended_by , to
each user, we can list the incoming friendship relations associated with that user. But
this doesn't come for free. For starters, we have to pay the initial and ongoing cost of
increased write latency, plus the increased disk utilization cost for storing the additional
metadata. On top of that, traversing the links remains expensive, because each hop
requires an index lookup. This is because aggregates have no notion of locality, unlike
graph databases, which naturally provide index-free adjacency through real—not reified
—relationships. By implementing a graph structure atop a nonnative store, we get some
of the benefits of partial connectedness, but at substantial cost.
This substantial cost is amplified when it comes to traversing deeper than just one hop.
Friends are easy enough, but imagine trying to compute—in real time—friends-of-
friends, or friends-of-friends-of-friends. That's impractical with this kind of database
because traversing a fake relationship isn't cheap. This not only limits your chances of
expanding your social network, but it reduces profitable recommendations, misses
faulty equipment in your data center, and lets fraudulent purchasing activity slip through
the net. Many systems try to maintain the appearance of graph-like processing, but
inevitably it's done in batches and doesn't provide the real-time interaction that users
demand.
Graph Databases Embrace Relationships
The previous examples have dealt with implicitly connected data. As users we infer
semantic dependencies between entities, but the data models—and the databases them‐
selves—are blind to these connections. To compensate, our applications must create a
network out of the flat, disconnected data at hand, and then deal with any slow queries
and latent writes across denormalized stores that arise.
What we really want is a cohesive picture of the whole, including the connections be‐
tween elements. In contrast to the stores we've just looked at, in the graph world, con‐
nected data is stored as connected data . Where there are connections in the domain,
 
Search WWH ::




Custom Search