Extracting Value From Big Data: In-Memory Solutions, Real Time Analytics, And Recommendation Systems - Big Data Imperatives

Databases Reference

In-Depth Information

Here are three reasons why:

• Big data memory requirements are costly. Some analytics

solutions require you to load all of your data into memory,

forcing you to invest in very expensive hardware, or more likely

subjecting your analytics solution to scalability constraints.

The ideal solution allows you to choose the optimal trade-off

between storing data in memory or in a database and allows

you to accelerate performance by adding more memory to the

system without being subject to the memory size constraints of

traditional proprietary solutions.

• Databases are more powerful when it comes to complex

calculations. With an in-memory only solution, complex

calculations on large data sets can easily result in a “out of memory”

error. To resolve this you are either forced to get a larger memory

capacity, slim down your data sets, or modify your calculations

(and as a result spend hours remodeling your data sets).

• For up-to-the-minute information, you still need your data

closest to its source. If things are changing so fast that you need

to see them in real time, you need a live connection to your data.

For example, some operational analysis applications like those

used by financial services organizations need competitive,

real-time, or near-real-time data. Your operational dashboards

can be hooked up directly to live data so you know when you

are facing peak demand or under-utilization. An all in-memory

solution would not provide the latest, freshest data.

Real-time Analytics and the CAP Theorem

Big data refers to the volume, velocity, and variety of highly structured, semi-structured

and loosely structured data that is in motion (streaming) and at rest (stored). Most

approaches to big data analytics are focused on batch processing of data, in essence big

data at rest. This means that analytic results such as trends and patterns only consider

what has happened in the past and not what is happening in the present.

What about big data in motion?

If you recall our discussion regarding the CAP theorem in Chapter 5, it stipulates

that it is impossible for any distributed computing system to simultaneously address

consistency, availability, and partition tolerance; you can at best achieve two out of three.

A system with high partition tolerance and availability (like Cassandra) will sacrifice some

consistency in order to do it. Similarly, for real-time analytics solutions, there is a variant

of the CAP theorem, called SCV (Figure 8-1 ).

Search WWH ::

Custom Search

Home