Databases Reference
In-Depth Information
who is BOSS_OF them all. We can see who's off the market, because they're MARRIED_TO
someone else; we can even see the antisocial elements in our otherwise social network,
as represented by DISLIKES relationships. With this graph at our disposal, we can now
look at the performance advantages of graph databases when dealing with connected
data.
Relationships in a graph naturally form paths. Querying—or traversing—the graph in‐
volves following paths. Because of the fundamentally path-oriented nature of the data
model, the majority of path-based graph database operations are highly aligned with
the way in which the data is laid out, making them extremely efficient. In their topic
Neo4j in Action , Partner and Vukotic perform an experiment using a relational store
and Neo4j. The comparison shows that the graph database is substantially quicker for
connected data than a relational store.
Partner and Vukotic's experiment seeks to find friends-of-friends in a social network,
to a maximum depth of five. Given any two persons chosen at random, is there a path
that connects them that is at most five relationships long? For a social network con‐
taining 1,000,000 people, each with approximately 50 friends, the results strongly sug‐
gest that graph databases are the best choice for connected data, as we see in Table 2-1 .
Table 2-1. Finding extended friends in a relational database versus efficient finding in
Neo4j
Depth RDBMS execution time (s) Neo4j execution time (s) Records returned
2 0.016 0.01 ~2500
3 30.267 0.168 ~110,000
4 1543.505 1.359 ~600,000
5 Unfinished 2.132 ~800,000
At depth two (friends-of-friends), both the relational database and the graph database
perform well enough for us to consider using them in an online system. Although the
Neo4j query runs in two-thirds the time of the relational one, an end user would barely
notice the difference in milliseconds between the two. By the time we reach depth three
(friend-of-friend-of-friend), however, it's clear that the relational database can no longer
deal with the query in a reasonable time frame: the 30 seconds it takes to complete would
be completely unacceptable for an online system. In contrast, Neo4j's response time
remains relatively flat: just a fraction of a second to perform the query—definitely quick
enough for an online system.
At depth four the relational database exhibits crippling latency, making it practically
useless for an online system. Neo4j's timings have deteriorated a little too, but the latency
here is at the periphery of being acceptable for a responsive online system. Finally, at
depth five, the relational database simply takes too long to complete the query. Neo4j,
in contrast, returns a result in around two seconds. At depth five, it turns out that almost
 
Search WWH ::




Custom Search