Database Reference
In-Depth Information
results, by at least two orders of magnitude. The SQL query for all friends at depth 5 did
not finish in the hour we ran the script.
Note
To store the large amount of data required for these examples, a significant amount of disk
space is required. To generate the sample data and run examples against it, you'll need in
excess of 10 GB of disk space available.
These results clearly show that the MySQL relational database is optimized for single
join queries, even on large data sets. The performance of multiple join queries on large
data sets degrades significantly, to the point that some queries are not even executable (for
example, friends at depth 5 for a data set of 1 million users).
Why are relational database queries so slow?
The results in table 1.3 are somewhat expected, given the way join operations work. As
we discussed earlier, each join creates a Cartesian product of all potential combinations
of rows, then filters out those that don't match the where clause. With 1 million users, the
Cartesian product of 5 joins (equivalent to a query at depth 5) contains a huge number of
rows—billions. Way too many zeros to be readable. Filtering out all the records that don't
match the query is too expensive, such that the SQL query at depth 5 never finishes in a
reasonable time.
We repeated the same experiment with Neo4j traversals. We had 1 million nodes repres-
enting users, and approximately 50 million relationships stored in Neo4j. We ran the same
four traversals as in the previous example, and we got the performance results in table 1.4 .
Table 1.4. The execution times for graph traversal using Neo4j on a data set of 1 million users
Depth
Execution time (seconds) for 1 million users
Count result
2
0.01
~2,500
3
0.168
~110,000
4
1.359
~600,000
5
2.132
~800,000
 
Search WWH ::




Custom Search