Real-Time Analytical Processing with InfoSphere Streams - Harness the Power of Big Data

Database Reference

In-Depth Information

The answer is windows of data . A window of data is a finite sequence of tuples

and looks a lot like a database view. Windows are continuously updated as new

data arrives, by eliminating the oldest tuples and adding the newest tuples.

Windows can be easily configured in many ways. For example, the window

size can be defined as N tuples long or M seconds long. Windows can be ad-

vanced in many ways, including one tuple at a time, or by replacing an entire

window at once. Each time the window is updated, you can think of it as a

temporarily frozen view. It's easy to correlate a frozen view with another win-

dow of data from a different stream, or compute aggregates using similar

techniques for aggregates and joins in relational databases. The windowing

libraries in Streams provide incredible productivity for building applications.

We discuss windowing later in this chapter where we talk about the various

operators, but it's an important concept to understand, because Streams is not

just about manipulating one tuple at a time, but rather analyzing large sets of

data in real time and gaining insight from analytics across multiple tuples,

streams, and context data.

Streams also has the concept of composite operators . A composite operator

consists of a reusable and configurable Streams subgraph. Technically, all

Streams applications contain at least one composite (the main composite for

the application), but they can include more than one composite (composites

can also be nested). A composite defines zero or more input streams and zero

or more output streams. Streams can be passed to the inputs of the composite

and are connected to inputs in the internal subgraph. Outputs from the inter-

nal subgraph are similarly connected to the composite outputs. A composite

can expose parameters that are used to customize its behavior. An extreme

example of nested composites is the Matryoshka sample application in

Streams, which is inspired by the Matryoshka (or Russian) dolls, the famous

wooden dolls of decreasing size that are stored inside one another.

Consider a simple example in which a composite operator, PetaOp , con-

tains a subgraph consisting of a single composite operator TeraOp , which in

turn contains a single composite operator GigaOp , and so on. This example is

illustrated in Figure 6-2, which shows an application deployed with just the

PetaOp operator (on the top) contrasted with a fully expanded composite

operator (on the bottom). You can see that composites are very powerful and

useful for big applications, because subtasks can be encapsulated and hid-

den, enabling developers to focus on the broader goals.

Search WWH ::

Custom Search

Home