Databases Reference
In-Depth Information
WITH pipes the results to the RETURN clause, filtering out redundant paths as it does
so. Redundant paths are present in the results at this point because colleagues and
colleagues-of-colleagues are often reachable through different paths, some longer
than others. We want to filter these longer paths out. That's exactly what the WITH
clause does. The WITH clause emits triples comprising a person, an interest, and the
length of the path from the subject of the query through the person to his interest.
Given that any particular person/interest combination may appear more than once
in the results, but with different path lengths, we want to aggregate these multiple
lines by collapsing them to a triple containing only the shortest path, which we do
using min(length(p)) as pathLength .
RETURN creates a projection of the data, performing more aggregation as it does so.
The data piped by the WITH clause to RETURN contains one entry per person per
interest: if a person matches two of the supplied interests, there will be two separate
data entries. We aggregate these entries using count and collect : count to create
an overall score for a person, collect to create a comma-separated list of matched
interests for that person. As part of the results, we also calculate how far the matched
person is from the subject of the query: we take the pathLength for that person,
subtract one (for the INTERESTED_IN relationship at the end of the path), and then
divide by two (because the person is separated from the subject by pairs of
WORKED_ON relationships). Finally, we order the results based on score , highest
score first, and limit them according to a resultLimit parameter supplied by the
query's client.
The MATCH clause in the preceding query uses a variable-length path,
[:WORKED_ON*0..2] , as part of a larger pattern to match people who have worked di‐
rectly with the subject of the query, as well as people who have worked on the same
project as people who have worked with the subject. Because each person is separated
from the subject of the query by one or two pairs of WORKED_ON relationships, Talent.net
could have written this portion of the query as MATCH p=(subject)-
[:WORKED_ON*2..4]-(person)-[:INTERESTED_IN]->(interest) , with a variable-
length path of between two and four WORKED_ON relationships. However, long variable-
length paths can be relatively inefficient. When writing such queries, it is advisable to
restrict variable-length paths to as narrow a scope as possible. To increase the perfor‐
mance of the query, Talent.net uses a fixed-length outgoing WORKED_ON relationship that
extends from the subject to her first project, and another fixed-length WORKED_ON rela‐
tionship that connects the matched person to a project, with a smaller variable-length
path in between.
Running this query against our sample graph, and again taking Sarah as the subject of
the query, if we look for colleagues and colleagues-of-colleagues who have interests in
Java, travel, or medicine, we get the following results:
Search WWH ::




Custom Search