Database Reference
In-Depth Information
We don't generally recommend Streams for analytics that require multiple
passes over a large data set, so scoring a regression model is possible, but
building one should be done using BigInsights or one of the purpose-built
IBM analytic data warehousing engines with data at rest. However, Streams
can take advantage of rich historical context by using models built by BigIn-
sights, the IBM PureData System for Operational Analytics (formerly know
as the IBM Smart Analytics System), the IBM PureData System for Analytics,
or other analytic tools, such as SPSS.
If you're already familiar with Complex Event Processing (CEP) systems,
you might see some similarities in Streams. However, Streams is designed to
be much more scalable and dynamic, to enable more complex analytics, and
to support a much higher data flow rate than other systems. Many CEP or
stream processing systems, including new open source projects like Storm,
advertise a few hundred thousand events per second within a whole cluster . In
contrast, the IBM Streams technology has been demonstrated to handle a few
million events per second on a single server— it is fast. (Don't forget, you can
deploy Streams in a cluster with near-linear scalability).
In addition, Streams has much better enterprise-level characteristics,
such as high availability, a rich and easy-to-use application development
tool set, numerous out-of-the box analytics, and integration with many
common enterprise systems. In fact, Streams provides nearly 30 built-in
operators in its standard toolkit, dozens of operators in extension toolkits
such as data mining and text analytics, and literally hundreds of functions
to facilitate application development. There is even a CEP toolkit that pro-
vides Streams functions normally found in CEP systems. Even though other
systems can't match the power of Streams, the emergence of CEP systems
and open source projects, such as Storm, highlights the growing importance
of data-in-motion analysis.
You can think of a Streams application as a set of interconnected operators.
Multiple operators are combined into a single, configurable deployment unit
called a job , and an application is made up of one or more jobs. An application
is normally deployed until it is cancelled, continuously processing the data
that flows through it. The operators that bring data into Streams applications
are typically referred to as source adapters . These operators read the input stream
Search WWH ::




Custom Search