Database Reference
In-Depth Information
public class
Trim
extends
PrimitiveEvalFunc
<
String
,
String
> {
@Override
public
String
exec
(
String input
) {
return
input
.
trim
();
}
}
In this case, we have taken advantage of
PrimitiveEvalFunc
, which is a specializa-
tion of
EvalFunc
for when the input is a single primitive (atomic) type. For the
Trim
In general, when you write an eval function, you need to consider what the output's
schema looks like. In the following statement, the schema of
B
is determined by the func-
tion
udf
:
B =
FOREACH
A
GENERATE
udf
($0);
If
udf
creates tuples with scalar fields, then Pig can determine
B
's schema through reflec-
tion. For complex types such as bags, tuples, or maps, Pig needs more help, and you
should implement the
outputSchema()
method to give Pig the information about the
output schema.
The
Trim
UDF returns a string, which Pig translates as a
chararray
, as can be seen
from the following session:
grunt>
DUMP A;
( pomegranate)
(banana )
(apple)
( lychee )
grunt>
DESCRIBE A;
A: {fruit: chararray}
grunt>
B = FOREACH A GENERATE com.hadoopbook.pig.Trim(fruit);
grunt>
DUMP B;
(pomegranate)
(banana)
(apple)
(lychee)
grunt>
DESCRIBE B;
B: {chararray}
A
has
chararray
fields that have leading and trailing spaces. We create
B
from
A
by ap-
plying the
Trim
function to the first field in
A
(named
fruit
).
B
's fields are correctly
inferred to be of type
chararray
.