Databases Reference
In-Depth Information
Here are three reasons why:
Big data memory requirements are costly. Some analytics
solutions require you to load all of your data into memory,
forcing you to invest in very expensive hardware, or more likely
subjecting your analytics solution to scalability constraints.
The ideal solution allows you to choose the optimal trade-off
between storing data in memory or in a database and allows
you to accelerate performance by adding more memory to the
system without being subject to the memory size constraints of
traditional proprietary solutions.
Databases are more powerful when it comes to complex
calculations. With an in-memory only solution, complex
calculations on large data sets can easily result in a “out of memory”
error. To resolve this you are either forced to get a larger memory
capacity, slim down your data sets, or modify your calculations
(and as a result spend hours remodeling your data sets).
For up-to-the-minute information, you still need your data
closest to its source. If things are changing so fast that you need
to see them in real time, you need a live connection to your data.
For example, some operational analysis applications like those
used by financial services organizations need competitive,
real-time, or near-real-time data. Your operational dashboards
can be hooked up directly to live data so you know when you
are facing peak demand or under-utilization. An all in-memory
solution would not provide the latest, freshest data.
Real-time Analytics and the CAP Theorem
Big data refers to the volume, velocity, and variety of highly structured, semi-structured
and loosely structured data that is in motion (streaming) and at rest (stored). Most
approaches to big data analytics are focused on batch processing of data, in essence big
data at rest. This means that analytic results such as trends and patterns only consider
what has happened in the past and not what is happening in the present.
What about big data in motion?
If you recall our discussion regarding the CAP theorem in Chapter 5, it stipulates
that it is impossible for any distributed computing system to simultaneously address
consistency, availability, and partition tolerance; you can at best achieve two out of three.
A system with high partition tolerance and availability (like Cassandra) will sacrifice some
consistency in order to do it. Similarly, for real-time analytics solutions, there is a variant
of the CAP theorem, called SCV (Figure 8-1 ).
 
Search WWH ::




Custom Search