Database Reference
In-Depth Information
Cascading in Practice
Now that we know what Cascading is and have a good idea of how it works, what does an
application written in Cascading look like? See Example 24-1 .
Example 24-1. Word count and sort
Scheme sourceScheme =
new TextLine ( new Fields ( "line" ));
Tap source =
new Hfs ( sourceScheme , inputPath );
Scheme sinkScheme = new TextLine ();
Tap sink =
new Hfs ( sinkScheme , outputPath , SinkMode . REPLACE );
Pipe assembly = new Pipe ( "wordcount" );
String regexString = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)" ;
Function regex = new RegexGenerator ( new Fields ( "word" ), regexString );
assembly =
new Each ( assembly , new Fields ( "line" ), regex );
assembly =
new GroupBy ( assembly , new Fields ( "word" ));
Aggregator count = new Count ( new Fields ( "count" ));
assembly = new Every ( assembly , count );
assembly =
new GroupBy ( assembly , new Fields ( "count" ), new Fields ( "word" ));
FlowConnector flowConnector = new FlowConnector ();
Flow flow =
flowConnector . connect ( "word-count" , source , sink , assembly );
flow . complete ();
We create a new Scheme that reads simple text files and emits a new Tuple for each
line in a field named “line,” as declared by the Fields instance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Search WWH ::




Custom Search