Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

6.10.2

Using graphs and custom shared-memory hardware

to detect health care fraud

Graphs are valuable in situations where data discovery is required. Graphs can show

relationships between health care beneficiaries, their claims, associated care provid-

ers, tests performed, and other relevant data. Graph analytics search through the data

to find patterns of relationships between all of these entities that might indicate collu-

sion to commit fraud.

The graph representing Medicare data is large: it represents six million providers,

a hundred million patients, and billions of claim records. The graph data is intercon-

nected between health care providers, diagnostic tests, and common treatments asso-

ciated with each patient and their claim records. This amount of data can't be held in

the memory of a single server, and partitioning the data across multiple nodes in a

computing cluster isn't feasible. Attempts to do so may result in incomplete queries

due to all the links crossing partition boundaries, the need to page data in and out of

memory, and the delays added by slower network and storage speeds. Meanwhile,

fraud continues to occur at an alarming rate.

Medicare fraud analytics requires an in-memory graph solution that can merge

heterogeneous data from a variety of sources, use queries to find patterns, and dis-

cover similarities as well as exact matches. With every item of data loaded into mem-

ory, there's no need to contend with the issue of graph partitioning. The graph can be

dynamically updated with new data easily, and existing queries can integrate the new

data into the analytics being performed, making the discovery of hidden relationships

in the data feasible.

Figure 6.17 shows the high-level architecture of how shared-memory systems are

used to look for patterns in large graphs.

With these requirements in mind, a US federally funded lab with a mandate to

identify Medicare and Medicaid fraud deployed YarcData's Urika appliance. The

appliance is capable of scaling from 1-512 terabytes of memory, shared by up to 8,192

Figure 6.17 How large graphs are

loaded into a central shared-

memory structure. This example

shows a graph in a central multi-

terabyte RAM store with

potentially hundreds or thousands

of simultaneous threads in CPUs

performing queries on the graph.

Note that, like other NoSQL

systems, the data stays in RAM

while the analysis is processing.

Each CPU can perform an

independent query on the graph

without interfering with each other.

Making Sense of NoSQL

Search WWH ::

Custom Search

Home