Database Reference
In-Depth Information
lowercase, but we realize that all future applications that need to parse words from docu-
ments should have the same behavior, so we'll instead create a reusable pipe called
SubAssembly , just like we would by creating a subroutine in a traditional application
(see Example 24-2 ) .
Example 24-2. Creating a SubAssembly
public class ParseWordsAssembly extends SubAssembly
{
public ParseWordsAssembly ( Pipe previous )
{
String regexString = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)" ;
Function regex = new RegexGenerator ( new Fields ( "word" ),
regexString );
previous = new Each ( previous , new Fields ( "line" ), regex );
String exprString = "word.toLowerCase()" ;
Function expression =
new ExpressionFunction ( new Fields ( "word" ), exprString ,
String . class );
previous = new Each ( previous , new Fields ( "word" ), expression );
setTails ( previous );
}
}
We subclass the SubAssembly class, which is itself a kind of Pipe .
We create a Java expression function that will call toLowerCase() on the String
value in the field named “word.” We must also pass in the Java type the expression ex-
pects “word” to be — in this case, String . ( Janino is used under the covers.)
We tell the SubAssembly superclass where the tail ends of our pipe subassembly
are.
First, we create a SubAssembly pipe to hold our “parse words” pipe assembly. Because
this is a Java class, it can be reused in any other application, as long as there is an incom-
ing field named “word” ( Example 24-3 ) . Note that there are ways to make this function
even more generic, but they are covered in the Cascading User Guide .
 
 
 
 
Search WWH ::




Custom Search