Database Reference
In-Depth Information
Leveraging types
The filter works when the quality field is declared to be of type int , but if the type in-
formation is absent, the UDF fails! This happens because the field is the default type,
bytearray , represented by the DataByteArray class. Because DataByteArray
is not an Integer , the cast fails.
The obvious way to fix this is to convert the field to an integer in the exec() method.
However, there is a better way, which is to tell Pig the types of the fields that the function
expects. The getArgToFuncMapping() method on EvalFunc is provided for pre-
cisely this reason. We can override it to tell Pig that the first field should be an integer:
@Override
public List < FuncSpec > getArgToFuncMapping () throws
FrontendException {
List < FuncSpec > funcSpecs = new ArrayList < FuncSpec >();
funcSpecs . add ( new FuncSpec ( this . getClass (). getName (),
new Schema ( new Schema . FieldSchema ( null , DataType . INTEGER ))));
return funcSpecs ;
}
This method returns a FuncSpec object corresponding to each of the fields of the tuple
that are passed to the exec() method. Here there is a single field, and we construct an
anonymous FieldSchema (the name is passed as null , since Pig ignores the name
when doing type conversion). The type is specified using the INTEGER constant on Pig's
DataType class.
With the amended function, Pig will attempt to convert the argument passed to the func-
tion to an integer. If the field cannot be converted, then a null is passed for the field. The
exec() method always returns false when the field is null . For this application, this
behavior is appropriate, as we want to filter out records whose quality field is unintelli-
gible.
An Eval UDF
Writing an eval function is a small step up from writing a filter function. Consider the
UDF in Example 16-2 , which trims the leading and trailing whitespace from chararray
values using the trim() method on java.lang.String . [ 101 ]
Example 16-2. An EvalFunc UDF to trim leading and trailing whitespace from chararray
values
Search WWH ::




Custom Search