Options for Storing Connected Data - Graph Databases

Databases Reference

In-Depth Information

who is BOSS_OF them all. We can see who's off the market, because they're MARRIED_TO

someone else; we can even see the antisocial elements in our otherwise social network,

as represented by DISLIKES relationships. With this graph at our disposal, we can now

look at the performance advantages of graph databases when dealing with connected

data.

Relationships in a graph naturally form paths. Querying—or traversing—the graph in‐

volves following paths. Because of the fundamentally path-oriented nature of the data

model, the majority of path-based graph database operations are highly aligned with

the way in which the data is laid out, making them extremely efficient. In their topic

Neo4j in Action , Partner and Vukotic perform an experiment using a relational store

and Neo4j. The comparison shows that the graph database is substantially quicker for

connected data than a relational store.

Partner and Vukotic's experiment seeks to find friends-of-friends in a social network,

to a maximum depth of five. Given any two persons chosen at random, is there a path

that connects them that is at most five relationships long? For a social network con‐

taining 1,000,000 people, each with approximately 50 friends, the results strongly sug‐

gest that graph databases are the best choice for connected data, as we see in Table 2-1 .

Table 2-1. Finding extended friends in a relational database versus efficient finding in

Neo4j

Depth RDBMS execution time (s) Neo4j execution time (s) Records returned

2 0.016 0.01 ~2500

3 30.267 0.168 ~110,000

4 1543.505 1.359 ~600,000

5 Unfinished 2.132 ~800,000

At depth two (friends-of-friends), both the relational database and the graph database

perform well enough for us to consider using them in an online system. Although the

Neo4j query runs in two-thirds the time of the relational one, an end user would barely

notice the difference in milliseconds between the two. By the time we reach depth three

(friend-of-friend-of-friend), however, it's clear that the relational database can no longer

deal with the query in a reasonable time frame: the 30 seconds it takes to complete would

be completely unacceptable for an online system. In contrast, Neo4j's response time

remains relatively flat: just a fraction of a second to perform the query—definitely quick

enough for an online system.

At depth four the relational database exhibits crippling latency, making it practically

useless for an online system. Neo4j's timings have deteriorated a little too, but the latency

here is at the periphery of being acceptable for a responsive online system. Finally, at

depth five, the relational database simply takes too long to complete the query. Neo4j,

in contrast, returns a result in around two seconds. At depth five, it turns out that almost

Search WWH ::

Custom Search

Home