Databases Reference
In-Depth Information
Figure 8-1. SCV for real-time analytics systems
Speed: This is all about fast response times and how quickly you can return an
appropriate analytic result from the time it was first observed. In essence, a real-time
system will have an updated analytic result within a relatively short time of an observed
event, whereas a non-real-time system might take hours or even days to process all of the
observations into an analytic result.
Consistency: This is all about confidence level on the accuracy of the response, how
accurate or precise (two different things) the analytic outcome is. A totally consistent
result accounts for 100 percent of observed data accounted for with complete accuracy
and some degree of precision. A less consistent system might use statistical sampling or
approximations to produce a reasonably precise but less accurate result.
Data Volume. This is all about the coverage or reach of the analytical result;
in other words, this refers to the total amount of observed events and data that need to
be analyzed. The problem starts at the point when data starts to exceed the bounds of
what can fit into memory. Massive or rapidly growing data sets have to be analyzed by
distributed systems.
If your working data set is never going to grow beyond 40 to 50 GB over the course of
its lifetime, then you can use an RDBMS or a specialized analytic appliance solutions and
have 100 percent consistent analytic results delivered to you in real-time, because your
entire working data set can fit into memory on a single machine and doesn't need to be
distributed.
However, if you're building an application with a rapidly growing data set and
unpredictable burst loads, you're going to need a system that sacrifices some speed or
consistency in order to be distributed so it can handle the large volume of raw data.
Search WWH ::




Custom Search