Databases Reference
In-Depth Information
The answer to this query is
Alice
; sadly,
Zach
doesn't consider
Bob
to be a friend. This
reciprocal query is still easy to implement, but on the database side it's more expensive,
because the database now has to consider all the rows in the
PersonFriend
table.
We can add an index, but this still involves an expensive layer of indirection. Things
become even more problematic when we ask, “who are the friends of my friends?”
Hierarchies in SQL use recursive joins, which make the query syntactically and com‐
putationally more complex, as shown in
Example 2-3
. (Some relational databases pro‐
vide syntactic sugar for this—for instance, Oracle has a
CONNECT BY
function—which
simplifies the query, but not the underlying computational complexity.)
Example 2-3. Alice's friends-of-friends
SELECT
p1
.
Person
AS
PERSON
,
p2
.
Person
AS
FRIEND_OF_FRIEND
FROM
PersonFriend
pf1
JOIN
Person
p1
ON
pf1
.
PersonID
=
p1
.
ID
JOIN
PersonFriend
pf2
ON
pf2
.
PersonID
=
pf1
.
FriendID
JOIN
Person
p2
ON
pf2
.
FriendID
=
p2
.
ID
WHERE
p1
.
Person
=
'Alice'
AND
pf2
.
FriendID
<>
p1
.
ID
This query is computationally complex, even though it only deals with the friends of
Alice's friends, and goes no deeper into Alice's social network. Things get more complex
and more expensive the deeper we go into the network. Though it's possible get an
answer to the question “who are my friends-of-friends-of-friends?” in a reasonable
period of time, queries that extend to four, five, or six degrees of friendship deteriorate
significantly due to the computational and space complexity of recursively joining
tables.
We work against the grain whenever we try to model and query connectedness in a
relational database. Besides the query and computational complexity just outlined, we
also have to deal with the double-edged sword of schema. More often than not, schema
proves to be both rigid and brittle. To subvert its rigidity we create sparsely populated
tables with many nullable columns, and code to handle the exceptional cases—all be‐
cause there's no real one-size-fits-all schema to accommodate the variety in the data we
encounter. This increases coupling and all but destroys any semblance of cohesion. Its
brittleness manifests itself as the extra effort and care required to migrate from one
schema to another as an application evolves.
NOSQL Databases Also Lack Relationships
Most NOSQL databases—whether key-value-, document-, or column-oriented—store
sets of disconnected documents/values/columns. This makes it difficult to use them for
connected data and graphs.