Graphs in the Real World - Graph Databases

Databases Reference

In-Depth Information

• WITH pipes the results to the RETURN clause, filtering out redundant paths as it does

so. Redundant paths are present in the results at this point because colleagues and

colleagues-of-colleagues are often reachable through different paths, some longer

than others. We want to filter these longer paths out. That's exactly what the WITH

clause does. The WITH clause emits triples comprising a person, an interest, and the

length of the path from the subject of the query through the person to his interest.

Given that any particular person/interest combination may appear more than once

in the results, but with different path lengths, we want to aggregate these multiple

lines by collapsing them to a triple containing only the shortest path, which we do

using min(length(p)) as pathLength .

• RETURN creates a projection of the data, performing more aggregation as it does so.

The data piped by the WITH clause to RETURN contains one entry per person per

interest: if a person matches two of the supplied interests, there will be two separate

data entries. We aggregate these entries using count and collect : count to create

an overall score for a person, collect to create a comma-separated list of matched

interests for that person. As part of the results, we also calculate how far the matched

person is from the subject of the query: we take the pathLength for that person,

subtract one (for the INTERESTED_IN relationship at the end of the path), and then

divide by two (because the person is separated from the subject by pairs of

WORKED_ON relationships). Finally, we order the results based on score , highest

score first, and limit them according to a resultLimit parameter supplied by the

query's client.

The MATCH clause in the preceding query uses a variable-length path,

[:WORKED_ON*0..2] , as part of a larger pattern to match people who have worked di‐

rectly with the subject of the query, as well as people who have worked on the same

project as people who have worked with the subject. Because each person is separated

from the subject of the query by one or two pairs of WORKED_ON relationships, Talent.net

could have written this portion of the query as MATCH p=(subject)-

[:WORKED_ON*2..4]-(person)-[:INTERESTED_IN]->(interest) , with a variable-

length path of between two and four WORKED_ON relationships. However, long variable-

length paths can be relatively inefficient. When writing such queries, it is advisable to

restrict variable-length paths to as narrow a scope as possible. To increase the perfor‐

mance of the query, Talent.net uses a fixed-length outgoing WORKED_ON relationship that

extends from the subject to her first project, and another fixed-length WORKED_ON rela‐

tionship that connects the matched person to a project, with a smaller variable-length

path in between.

Running this query against our sample graph, and again taking Sarah as the subject of

the query, if we look for colleagues and colleagues-of-colleagues who have interests in

Java, travel, or medicine, we get the following results:

Search WWH ::

Custom Search

Home