Database Reference
In-Depth Information
the stream of data moving between the operators. Each output of an operator
defines a new stream, and other operators can connect to the stream. Operators
occurring early in a pipeline can even connect to a stream that is produced by
“downstream” operators, enabling control flows to change the computation
of upstream operators as new insights are uncovered. Figure 6-1 represents a
simple stream graph that reads data from a file, sends the data to an operator
known as a functor (this operator transforms incoming data in some program-
matic manner), and then feeds that data to another operator. In this figure,
the streamed data is fed to a split operator , which then feeds data to either a
file sink or a database (depending on what goes on inside the split operator).
The data elements in a stream are known as tuples . In a relational database
sense, you can think of a tuple as similar to a row of data. However, when
Streams works on semistructured and unstructured data, a tuple is an abstrac-
tion that represents a package of data, and that's why we think of a tuple as
a set of attributes for a given object. Each element in the tuple contains the
value for that attribute, which can be a character string, a number, a date, or
even some sort of binary object, such as a video frame. For applications with
semistructured data, it is common for Streams applications to start with tuples
that consist of a small amount of metadata coupled with an unstructured
payload in each tuple, with subsequent operators progressively extracting
more information from the unstructured payload.
The simplest operators work on one tuple at a time. These operators can
filter a tuple based on characteristics of its attributes, extract additional infor-
mation from the tuple, and transform the tuple before sending data to an output
stream. Because a stream consists of a never-ending sequence of tuples, how
can you correlate across different streams, sort tuples, or compute aggregates?
FileSink
FileSource
Functor
Split
ODBCAppend
Figure 6-1 A simple data stream that applies a transformation to data and splits it into
two possible outputs based on predeined logic
Search WWH ::




Custom Search