Using NoSQL to manage big data - Making Sense of NoSQL

Databases Reference

In-Depth Information

need. This is ideal for creating materialized views and storing them in your RDBMS s or

NoSQL database.

Although Apache Flume was originally written for processing log files, it's a general-

purpose tool and can be used on other types of immutable big data problems such as

data loggers or raw data from web crawling systems. As data loggers get lower in price,

tools like Apache Flume will be needed to preprocess more big data problems.

6.10

Case study: computer-aided discovery

of health care fraud

In this case study, we'll take a look at a problem that can't be easily solved using a

shared-nothing architecture. This is the problem of looking for patterns of fraud

using large graphs. Highly connected graphs aren't partition tolerant—meaning that

you can't divide the queries on a graph on two or more shared-nothing processors. If

your graph is too large to fit in the RAM of a commodity processor, you may need to

look at an alternative to a shared-nothing system.

This case study is important because it explores the limits of what a cluster of

shared-nothing systems can do. We include this case study because we want to avoid a

tendency for architects to recommend large shared-nothing clusters for all problems.

Although shared-nothing architectures work for many big data problems, they don't

provide for linear scaling of highly connected data such as graphs or RDBMS s contain-

ing joins. Looking for hidden patterns in large graphs is one area that's best solved

with a custom hardware approach.

6.10.1

What is health care fraud detection?

The US Congressional Office of Management and Budget estimates that improper

payments in Medicare and Medicaid came to $50.7 billion in 2010, nearly 8.5% of the

annual Medicare budget. A portion of this staggering figure is the result of improper

documentation, but it's certain that Medicare fraud costs taxpayers tens of billions of

dollars annually.

Existing efforts to detect fraud have focused on searching for suspicious submis-

sions from individual beneficiaries and health care providers. These efforts yielded

$4.1 billion in fraud recovery in 2011, around 10% of the total estimated fraud.

Unfortunately, fraud is becoming more sophisticated, and detection must move

beyond the search for individuals to the discovery of patterns of collusion among mul-

tiple beneficiaries and/or health care providers. Identifying these patterns is challeng-

ing, as fraudulent behaviors continuously change, requiring the analyst to hypothesize

that a pattern of relationships could indicate fraud, visualize and evaluate the results,

and iteratively refine their hypothesis.

Search WWH ::

Custom Search

Home