Database Reference
In-Depth Information
By this point in the topic, we're sure that you understand the importance
of the Big Data velocity characteristic, and how the integration of at-rest ana-
lytics, moved to the frontier of your business through an in-motion platform,
can really boost your Big Data IQ. For this reason, this chapter is a little more
detailed, from a technology perspective, than the other chapters in this topic.
But let's start by clarifying what we mean by Streams and stream ; the capital-
ized form refers to the IBM InfoSphere Streams product, and the lowercase
version refers to a stream of data. With that in mind, let's look at the basics of
Streams, some of the technical underpinnings that define how it works, and
its use cases.
The Basics: InfoSphere Streams
Streams is a powerful analytic computing software platform that continu-
ously analyzes and transforms data in memory before it is stored on disk.
Instead of gathering large quantities of data, manipulating and storing it on
disk, and then analyzing it, as is the case with other analytic approaches,
Streams enables you to apply the analytics directly on data in motion. When
you analyze data in motion with Streams, you get the fastest possible results,
huge potential hardware savings, and the highest throughput.
With all of those advantages, you might ask “What's the catch?” To achieve
these benefits, Streams operates primarily on “windows” of data that are
maintained in memory across a cluster. Nevertheless, large memory sizes
enable windows of data to represent analytics over a few seconds to a few
days of data, depending on the data flow rates. This data can be enriched
with context that has been accumulated during processing and with data
from an at-rest engine, such as Hadoop, or a database.
We generally recommend Streams for the following use cases:
Identifying events in real time, such as determining when customer
sentiment in social media is becoming more negative
Correlating and combining events that are closely related in time, such
as a warning in a log file followed by a system outage
Continuously calculating grouped aggregates, such as price trends per
symbol per industry in the stock market
 
Search WWH ::




Custom Search