Graphics Reference
In-Depth Information
==>[1, Algorithms on Strings, Trees, and Sequences:
Computer Science and
Computational Biology]
Using Graph Queries to Extract Neighborhoods
Now that you know a few of the basics of Gremlin syntax and you've
explored the structure of the data a little, it's time to put it to use. The
goal of this exercise is to analyze product associations represented by
co-purchasing and reviews to gain insights that will be useful for marketing
and advertising around a particular book. A subgraph of products
representing the neighborhood of interest will be output for visualization
and analysis.
Customers link products through reviews, and, unlike co-purchasing,
similarity links for the list are not limited to five. You'll start with a single
product, extract related products and edges between them, and export the
resulting subgraph for visualization and further analysis. For this exercise,
you focus on associated interests for one of Edward Tufte's seminal
visualization topics, Envisioning Information (Cheshire, CT:1990, Graphics
Press).
Begin by finding the topic and storing a reference to it. Note that one of the
limitations of using the Lucene index in Titan is that each term is indexed
separately, making it necessary here in common syntax to query separately
for “Envisioning” and “Information.” Because the output of Gremlin steps
are lists, add a call to next() to store a reference to the topic itself, the first
item in the list.
gremlin> tufteBook =
g.V.has('title',CONTAINS,'Envisioning').has
('title', CONTAINS,'Information').next()
==>v[4745708]
Before collecting related nodes, it's a good idea to do a quick sanity check on
the counts of products that are linked through co-purchasing or co-review,
being careful not to count the same nodes twice.
gremlin> tufteBook.both('similar').dedup().count()
==>25
gremlin>
Search WWH ::




Custom Search