Database Reference
In-Depth Information
public class Trim extends PrimitiveEvalFunc < String , String > {
@Override
public String exec ( String input ) {
return input . trim ();
}
}
In this case, we have taken advantage of PrimitiveEvalFunc , which is a specializa-
tion of EvalFunc for when the input is a single primitive (atomic) type. For the Trim
UDF, the input and output types are both of type String . [ 102 ]
In general, when you write an eval function, you need to consider what the output's
schema looks like. In the following statement, the schema of B is determined by the func-
tion udf :
B = FOREACH A GENERATE udf ($0);
If udf creates tuples with scalar fields, then Pig can determine B 's schema through reflec-
tion. For complex types such as bags, tuples, or maps, Pig needs more help, and you
should implement the outputSchema() method to give Pig the information about the
output schema.
The Trim UDF returns a string, which Pig translates as a chararray , as can be seen
from the following session:
grunt> DUMP A;
( pomegranate)
(banana )
(apple)
( lychee )
grunt> DESCRIBE A;
A: {fruit: chararray}
grunt> B = FOREACH A GENERATE com.hadoopbook.pig.Trim(fruit);
grunt> DUMP B;
(pomegranate)
(banana)
(apple)
(lychee)
grunt> DESCRIBE B;
B: {chararray}
A has chararray fields that have leading and trailing spaces. We create B from A by ap-
plying the Trim function to the first field in A (named fruit ). B 's fields are correctly
inferred to be of type chararray .
Search WWH ::




Custom Search