Database Reference
In-Depth Information
lowercase, but we realize that all future applications that need to parse words from docu-
ments should have the same behavior, so we'll instead create a reusable pipe called
SubAssembly
, just like we would by creating a subroutine in a traditional application
(see
Example 24-2
)
.
Example 24-2. Creating a SubAssembly
public class
ParseWordsAssembly
extends
SubAssembly
{
public
ParseWordsAssembly
(
Pipe previous
)
{
String regexString
=
"(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)"
;
Function regex
=
new
RegexGenerator
(
new
Fields
(
"word"
),
regexString
);
previous
=
new
Each
(
previous
,
new
Fields
(
"line"
),
regex
);
String exprString
=
"word.toLowerCase()"
;
Function expression
=
new
ExpressionFunction
(
new
Fields
(
"word"
),
exprString
,
String
.
class
);
previous
=
new
Each
(
previous
,
new
Fields
(
"word"
),
expression
);
setTails
(
previous
);
}
}
We subclass the
SubAssembly
class, which is itself a kind of
Pipe
.
We create a Java expression function that will call
toLowerCase()
on the
String
value in the field named “word.” We must also pass in the Java type the expression ex-
We tell the
SubAssembly
superclass where the tail ends of our pipe subassembly
are.
First, we create a
SubAssembly
pipe to hold our “parse words” pipe assembly. Because
this is a Java class, it can be reused in any other application, as long as there is an incom-
ing field named “word” (
Example 24-3
)
. Note that there are ways to make this function
even more generic, but they are covered in the
Cascading User Guide
.