Databases Reference
In-Depth Information
Doing this type of analysis using an RDBMS would be slow. In the social networking
scenario, you can create a “friends” table for each person with three columns: the ID
of the first person, the ID of the second person, and the relationship type (family,
close friend, or acquaintance). You can then index that table on both the first and sec-
ond person, and an RDBMS will quickly return a list of your friends and your friends-
of-friends. But in order to determine the next level of relationships, another SQL
query is required. As you continue to build out the relationships, the size of each
query grows quickly. If you have 100 friends who each have 100 friends, the friends-of-
friends query or the second-level friends returns 10,000 (100 x 100) rows. As you
might guess, doing this type of query in SQL could be complex.
Graph stores can perform these operations much faster by using techniques that
consolidate and remove unwanted nodes from memory. Though graph stores would
clearly be much faster for link analysis tasks, they usually require enough RAM to store
all the links during analysis.
Graph stores are used for things beyond social networking—they're appropriate
for identifying distinct patterns of connections between nodes. For example, creating
a graph of all incoming and outgoing phone calls between people in a prison might
show a concentration of calls (patterns) associated with organized crime. Analyzing
the movement of funds between bank accounts might show patterns of money laun-
dering or credit card fraud. Companies that are under criminal investigation might
have all of their email messages analyzed using graph software to see who sent who
what information and when. Law firms, law enforcement agencies, intelligence agen-
cies, and banks are the most frequent users of graph store systems to detect legitimate
activities as well as for fraud detection.
Graph stores are also useful for linking together data and searching for patterns
within large collections of text documents. Entity extraction is the process of identifying
the most important items (entities) in a document. Entities are usually the nouns in a
document like people, dates, places, and products. Once the key entities have been
identified, they're used to perform advanced search functions. For example, if you
know all the dates and people mentioned in a document, you can create a report that
shows which documents mention what people and when.
This entity extraction process (a type of natural language processing or NLP ) can be
combined with other tools to extract simple facts or assertions made within a docu-
ment. For example, the sentence “John Adams was born on October 19, 1735” can be
broken into the following assertions:
A person record was found with the name of John Adams and is a subject.
1
The born-on relationship links the subject to the object.
2
A date object record was found that has the value of October 19, 1735.
3
Although simple assertions can be easy to find using simple NLP processing, the pro-
cess of fully understanding every sentence can be complex and dependant on the con-
text of the situation. Our key takeaway is that if assertions are found in text, they can
best be represented in graph structures.
Search WWH ::




Custom Search