Applying Big Data to Business Problems: A Sampling of Use Cases - Harness the Power of Big Data

Database Reference

In-Depth Information

artificially constrained. After all, it's harder to find outliers when you don't

store data profiling the attributes of those outliers. That said, having a larger

set of data and attributes is only useful if you can handle the compute capa-

bilities that are required to churn through them and find the signals that are

buried within the noise. It's also critical to load and process data quickly

enough to trap fast-moving events.

Fraud cases traditionally involve the use of samples and models to iden-

tify customers that exhibit a certain kind of profile. Although it works, the

problem with this approach (and this is a trend that you're going to see in a

lot of these use cases) is that you're profiling a segment and not at the indi-

vidual transaction or person level. Making a forecast based on a segment is

good, but making a decision that's based upon the actual particulars of an indi-

vidual and correlate their transaction is obviously better. To do this, you need

to work up a larger set of data than is possible with traditional approaches.

We estimate that less than 50 percent (and usually much less than that) of the

available information that could be useful for fraud modeling is actually being

used. You might think that the solution would be to load the other 50 percent

of the data into your traditional analytic warehouse. The reasons why this

isn't practical seem to come up in most Big Data usage patterns, namely: the

data won't fit; it'll contain data types that the warehouse can't effectively use;

it'll most likely require disruptive schema changes; and it could very well

slow your existing workloads to a crawl.

If stuffing the rest of the data into existing warehouses isn't going to work,

then what will? We think that the core engines of the IBM Big Data platform

(BigInsights, Streams, and the analytics-based IBM PureData Systems) give

you the flexibility and agility to take your fraud models to the next level. BigIn-

sights addresses the concerns we outlined in the previous paragraph, because

it will scale to just about any volume and handle any data type required. Be-

cause it doesn't impose a schema on-write, you'll have maximum flexibility

in how you organize your data, and your work won't impact existing work-

loads and other systems. Finally, BigInsights is highly scalable; you can start

small and grow in a highly cost-effective manner (trust us when we say that

your CIO will like this part).

Now that you have BigInsights to provide an elastic and cost-effective

repository for all of the available data, how do you go about finding those

outliers?

Search WWH ::

Custom Search

Home