Database Reference
In-Depth Information
• The
StreamSystemPartition
object, accessible via the
getSystemStreamPartition
method. This contains metadata about
the message, such as its originating partition and offset information.
Most tasks do not need to access this information directly.
The
SystemProducer
is usually accessed through a
SystemStream
defined as a static final member of the task's class. For example, to define
a Kafka output stream on the topic “
output
” add the following line to the
class definition:
private static final SystemStream OUTPUT =
new SystemStream("kafka","output");
This
SystemStream
object is used to initialize an
OutgoingMessageEnvelope
along with the message payload and an
optional key payload. This object is then sent to the
MessageCollector
,
which manages placing the outgoing message on the specified stream.
Implementing a Stream Task
The task itself is defined by the StreamTask interface, which is implemented
by all task classes. This interface has a single process method, which takes
IncomingMessageEnvelope
,
MessageCollector
, and
TaskCoordinator
as arguments. The
IncomingMessageEnvelope
is
a message from the
task.inputs
streams as described in the previous
section, whereas the
MessageCollector
is used to emit events to output
streams, which are defined within the class.
For example, this task handles the task of splitting the input of the
wikipedia-raw
topic into words, the same way that words were split up
in the Trident example earlier in this chapter. In this case the JSON SerDe
is being used to parse the incoming data (which is in JSON) and sent to the
wikipedia-words
topic.
public class
WordSplitTask
implements
StreamTask {
private static final
SystemStream
OUTPUT_STREAM
=
new
SystemStream("kafka","wikipedia-words");
public void
process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator)
throws
Exception {