NoSQL data architecture patterns - Making Sense of NoSQL

Databases Reference

In-Depth Information

Doing this type of analysis using an RDBMS would be slow. In the social networking

scenario, you can create a “friends” table for each person with three columns: the ID

of the first person, the ID of the second person, and the relationship type (family,

close friend, or acquaintance). You can then index that table on both the first and sec-

ond person, and an RDBMS will quickly return a list of your friends and your friends-

of-friends. But in order to determine the next level of relationships, another SQL

query is required. As you continue to build out the relationships, the size of each

query grows quickly. If you have 100 friends who each have 100 friends, the friends-of-

friends query or the second-level friends returns 10,000 (100 x 100) rows. As you

might guess, doing this type of query in SQL could be complex.

Graph stores can perform these operations much faster by using techniques that

consolidate and remove unwanted nodes from memory. Though graph stores would

clearly be much faster for link analysis tasks, they usually require enough RAM to store

all the links during analysis.

Graph stores are used for things beyond social networking—they're appropriate

for identifying distinct patterns of connections between nodes. For example, creating

a graph of all incoming and outgoing phone calls between people in a prison might

show a concentration of calls (patterns) associated with organized crime. Analyzing

the movement of funds between bank accounts might show patterns of money laun-

dering or credit card fraud. Companies that are under criminal investigation might

have all of their email messages analyzed using graph software to see who sent who

what information and when. Law firms, law enforcement agencies, intelligence agen-

cies, and banks are the most frequent users of graph store systems to detect legitimate

activities as well as for fraud detection.

Graph stores are also useful for linking together data and searching for patterns

within large collections of text documents. Entity extraction is the process of identifying

the most important items (entities) in a document. Entities are usually the nouns in a

document like people, dates, places, and products. Once the key entities have been

identified, they're used to perform advanced search functions. For example, if you

know all the dates and people mentioned in a document, you can create a report that

shows which documents mention what people and when.

This entity extraction process (a type of natural language processing or NLP ) can be

combined with other tools to extract simple facts or assertions made within a docu-

ment. For example, the sentence “John Adams was born on October 19, 1735” can be

broken into the following assertions:

A person record was found with the name of John Adams and is a subject.

1

The born-on relationship links the subject to the object.

2

A date object record was found that has the value of October 19, 1735.

3

Although simple assertions can be easy to find using simple NLP processing, the pro-

cess of fully understanding every sentence can be complex and dependant on the con-

text of the situation. Our key takeaway is that if assertions are found in text, they can

best be represented in graph structures.

Making Sense of NoSQL

Search WWH ::

Custom Search

Home