Database Reference
In-Depth Information
A Filter and Split Bolt
This example shows a bolt that takes an input stream and splits it into
several different streams according to a regular expression on a field. It
also shows how to introspect the incoming tuples to define the output
streams programmatically.
The filters are defined during the configuration of the topology. The
bolt defines a simple internal class called
FilterDefinition
, which
holds the output stream, the name of the field to check, as well as the
regular expression that will be evaluated against the field's value. This
class implements
Serializable
so it will be properly serialized when
the bolts are distributed across the cluster.
The
filter
function itself is implemented in a “chainable” style to
make it easy to use in topology definitions:
public class
FilterBolt
extends
BaseRichBolt {
private static final long
serialVersionUID
=
-7739856267277627178L;
public class
FilterDefinition
implements
Serializable {
public
String stream;
public
String fieldName;
public
Pattern regexp;
public
Fields fields;
private static final long
serialVersionUID
= 1L;
}
ArrayList<FilterDefinition> filters
=
new
ArrayList<FilterDefinition>();
public
FilterBolt filter(String stream,String
field,String regexp,
String... fields) {
FilterDefinition def =
new
FilterDefinition();
def.stream = stream;
def.fieldName = field;
def.regexp = Pattern.
compile
(regexp);