Databases Reference
In-Depth Information
•
WITH
pipes the results to the
RETURN
clause, filtering out redundant paths as it does
so. Redundant paths are present in the results at this point because colleagues and
colleagues-of-colleagues are often reachable through different paths, some longer
than others. We want to filter these longer paths out. That's exactly what the
WITH
clause does. The
WITH
clause emits triples comprising a person, an interest, and the
length of the path from the subject of the query through the person to his interest.
Given that any particular person/interest combination may appear more than once
in the results, but with different path lengths, we want to aggregate these multiple
lines by collapsing them to a triple containing only the shortest path, which we do
using
min(length(p)) as pathLength
.
•
RETURN
creates a projection of the data, performing more aggregation as it does so.
The data piped by the
WITH
clause to
RETURN
contains one entry per person per
interest: if a person matches two of the supplied interests, there will be two separate
data entries. We aggregate these entries using
count
and
collect
:
count
to create
an overall score for a person,
collect
to create a comma-separated list of matched
interests for that person. As part of the results, we also calculate how far the matched
person is from the subject of the query: we take the
pathLength
for that person,
subtract one (for the
INTERESTED_IN
relationship at the end of the path), and then
divide by two (because the person is separated from the subject by pairs of
WORKED_ON
relationships). Finally, we order the results based on
score
, highest
score
first, and limit them according to a
resultLimit
parameter supplied by the
query's client.
The
MATCH
clause in the preceding query uses a variable-length path,
[:WORKED_ON*0..2]
, as part of a larger pattern to match people who have worked di‐
rectly with the subject of the query, as well as people who have worked on the same
project as people who have worked with the subject. Because each person is separated
from the subject of the query by one or two pairs of
WORKED_ON
relationships, Talent.net
could have written this portion of the query as
MATCH p=(subject)-
[:WORKED_ON*2..4]-(person)-[:INTERESTED_IN]->(interest)
, with a variable-
length path of between two and four
WORKED_ON
relationships. However, long variable-
length paths can be relatively inefficient. When writing such queries, it is advisable to
restrict variable-length paths to as narrow a scope as possible. To increase the perfor‐
mance of the query, Talent.net uses a fixed-length outgoing
WORKED_ON
relationship that
extends from the subject to her first project, and another fixed-length
WORKED_ON
rela‐
tionship that connects the matched person to a project, with a smaller variable-length
path in between.
Running this query against our sample graph, and again taking Sarah as the subject of
the query, if we look for colleagues and colleagues-of-colleagues who have interests in
Java, travel, or medicine, we get the following results: