Options for Storing Connected Data - Graph Databases

Databases Reference

In-Depth Information

The answer to this query is Alice ; sadly, Zach doesn't consider Bob to be a friend. This

reciprocal query is still easy to implement, but on the database side it's more expensive,

because the database now has to consider all the rows in the PersonFriend table.

We can add an index, but this still involves an expensive layer of indirection. Things

become even more problematic when we ask, “who are the friends of my friends?”

Hierarchies in SQL use recursive joins, which make the query syntactically and com‐

putationally more complex, as shown in Example 2-3 . (Some relational databases pro‐

vide syntactic sugar for this—for instance, Oracle has a CONNECT BY function—which

simplifies the query, but not the underlying computational complexity.)

Example 2-3. Alice's friends-of-friends

SELECT p1 . Person AS PERSON , p2 . Person AS FRIEND_OF_FRIEND

FROM PersonFriend pf1 JOIN Person p1

ON pf1 . PersonID = p1 . ID

JOIN PersonFriend pf2

ON pf2 . PersonID = pf1 . FriendID

JOIN Person p2

ON pf2 . FriendID = p2 . ID

WHERE p1 . Person = 'Alice' AND pf2 . FriendID <> p1 . ID

This query is computationally complex, even though it only deals with the friends of

Alice's friends, and goes no deeper into Alice's social network. Things get more complex

and more expensive the deeper we go into the network. Though it's possible get an

answer to the question “who are my friends-of-friends-of-friends?” in a reasonable

period of time, queries that extend to four, five, or six degrees of friendship deteriorate

significantly due to the computational and space complexity of recursively joining

tables.

We work against the grain whenever we try to model and query connectedness in a

relational database. Besides the query and computational complexity just outlined, we

also have to deal with the double-edged sword of schema. More often than not, schema

proves to be both rigid and brittle. To subvert its rigidity we create sparsely populated

tables with many nullable columns, and code to handle the exceptional cases—all be‐

cause there's no real one-size-fits-all schema to accommodate the variety in the data we

encounter. This increases coupling and all but destroys any semblance of cohesion. Its

brittleness manifests itself as the extra effort and care required to migrate from one

schema to another as an application evolves.

NOSQL Databases Also Lack Relationships

Most NOSQL databases—whether key-value-, document-, or column-oriented—store

sets of disconnected documents/values/columns. This makes it difficult to use them for

connected data and graphs.

Search WWH ::

Custom Search

Home