Big Data - Graph Analysis and Visualization

Graphics Reference

In-Depth Information

==>[1, Algorithms on Strings, Trees, and Sequences:

Computer Science and

Computational Biology]

Using Graph Queries to Extract Neighborhoods

Now that you know a few of the basics of Gremlin syntax and you've

explored the structure of the data a little, it's time to put it to use. The

goal of this exercise is to analyze product associations represented by

co-purchasing and reviews to gain insights that will be useful for marketing

and advertising around a particular book. A subgraph of products

representing the neighborhood of interest will be output for visualization

and analysis.

Customers link products through reviews, and, unlike co-purchasing,

similarity links for the list are not limited to five. You'll start with a single

product, extract related products and edges between them, and export the

resulting subgraph for visualization and further analysis. For this exercise,

you focus on associated interests for one of Edward Tufte's seminal

visualization topics, Envisioning Information (Cheshire, CT:1990, Graphics

Press).

Begin by finding the topic and storing a reference to it. Note that one of the

limitations of using the Lucene index in Titan is that each term is indexed

separately, making it necessary here in common syntax to query separately

for “Envisioning” and “Information.” Because the output of Gremlin steps

are lists, add a call to next() to store a reference to the topic itself, the first

item in the list.

gremlin> tufteBook =

g.V.has('title',CONTAINS,'Envisioning').has

('title', CONTAINS,'Information').next()

==>v[4745708]

Before collecting related nodes, it's a good idea to do a quick sanity check on

the counts of products that are linked through co-purchasing or co-review,

being careful not to count the same nodes twice.

gremlin> tufteBook.both('similar').dedup().count()

==>25

gremlin>

Search WWH ::

Custom Search

Home