Options for Storing Connected Data - Graph Databases

Databases Reference

In-Depth Information

reasonable-sized datasets, where we'd prefer an O(log n) algorithm—which is somewhat

efficient because it discards half the potential workload on each iteration—or better.

Conversely, a graph database provides constant order lookup for the same query. In this

case, we simply find the node in the graph that represents Bob, and then follow any

incoming friend relationships; these relationships lead to nodes that represent people

who consider Bob to be their friend. This is far cheaper than brute-forcing the result

because it considers far fewer members of the network; that is, it considers only those

that are connected to Bob. Of course, if everybody is friends with Bob, we'll still end up

considering the entire dataset.

To avoid having to process the entire dataset, we could denormalize the storage model

by adding backward links. Adding a second property, called perhaps friended_by , to

each user, we can list the incoming friendship relations associated with that user. But

this doesn't come for free. For starters, we have to pay the initial and ongoing cost of

increased write latency, plus the increased disk utilization cost for storing the additional

metadata. On top of that, traversing the links remains expensive, because each hop

requires an index lookup. This is because aggregates have no notion of locality, unlike

graph databases, which naturally provide index-free adjacency through real—not reified

—relationships. By implementing a graph structure atop a nonnative store, we get some

of the benefits of partial connectedness, but at substantial cost.

This substantial cost is amplified when it comes to traversing deeper than just one hop.

Friends are easy enough, but imagine trying to compute—in real time—friends-of-

friends, or friends-of-friends-of-friends. That's impractical with this kind of database

because traversing a fake relationship isn't cheap. This not only limits your chances of

expanding your social network, but it reduces profitable recommendations, misses

faulty equipment in your data center, and lets fraudulent purchasing activity slip through

the net. Many systems try to maintain the appearance of graph-like processing, but

inevitably it's done in batches and doesn't provide the real-time interaction that users

demand.

Graph Databases Embrace Relationships

The previous examples have dealt with implicitly connected data. As users we infer

semantic dependencies between entities, but the data models—and the databases them‐

selves—are blind to these connections. To compensate, our applications must create a

network out of the flat, disconnected data at hand, and then deal with any slow queries

and latent writes across denormalized stores that arise.

What we really want is a cohesive picture of the whole, including the connections be‐

tween elements. In contrast to the stores we've just looked at, in the graph world, con‐

nected data is stored as connected data . Where there are connections in the domain,

Search WWH ::

Custom Search

Home