Database Reference
In-Depth Information
and the analysis can proceed as it does in the batch case. If a batch study
has already been performed on an earlier dataset, it can be used to inform
the streaming analysis. However, it is often not known if the common states
for the current data will be the common states for future data. In fact,
changes in the mix of states might actually be the metric of interest. More
commonly, there is no previous data to perform an analysis upon. In this
case, the streaming system must attempt to deal with the data at its natural
cardinality.
This is difficult both in terms of processing and in terms of storage. Doing
anything with a large set necessarily takes time to process anything that
involves a large number of different states. It also requires a linear amount
of space to store information about each different state and, unlike batch
processing, storage space is much more restricted than in the batch setting
because it must generally use very fast main memory storage instead of the
muchslowertertiarystorageofharddrives.Thishasbeenrelaxedsomewhat
with the introduction of high-performance Solid State Drives (SSDs), but
they are still orders of magnitude slower than memory access.
As a result, a major topic of research in streaming data is how to deal with
high-cardinalitydata.This topic discussessomeoftheapproachestodealing
with the problem. As an active area of research, more solutions are being
developed and improved every day.
Infrastructures and Algorithms
The intent of this topic is to provide the reader with the ability to implement
a streaming data project from start to finish. An algorithm without an
infrastructure is, perhaps, an interesting research paper, but not a finished
system. An infrastructure without an application is mostly just a waste of
resources.
The approach of “build it and they will come” really isn't going to work if you
focus solely on an algorithm or an infrastructure. Instead, a tangible system
must be built implementing both the algorithm and the infrastructure
required to support it. With an example in place, other people will be able
to see how the pieces fit together and add their own areas of interest to
the infrastructure. One important thing to remember when building the
infrastructure (and it bears repeating) is that the goal is to make the
infrastructure and algorithms accessible to a variety of users in an
Search WWH ::




Custom Search