Database Reference
In-Depth Information
Leveraging types
The filter works when the quality field is declared to be of type
int
, but if the type in-
formation is absent, the UDF fails! This happens because the field is the default type,
bytearray
, represented by the
DataByteArray
class. Because
DataByteArray
is not an
Integer
, the cast fails.
The obvious way to fix this is to convert the field to an integer in the
exec()
method.
However, there is a better way, which is to tell Pig the types of the fields that the function
expects. The
getArgToFuncMapping()
method on
EvalFunc
is provided for pre-
cisely this reason. We can override it to tell Pig that the first field should be an integer:
@Override
public
List
<
FuncSpec
>
getArgToFuncMapping
()
throws
FrontendException
{
List
<
FuncSpec
>
funcSpecs
=
new
ArrayList
<
FuncSpec
>();
funcSpecs
.
add
(
new
FuncSpec
(
this
.
getClass
().
getName
(),
new
Schema
(
new
Schema
.
FieldSchema
(
null
,
DataType
.
INTEGER
))));
return
funcSpecs
;
}
This method returns a
FuncSpec
object corresponding to each of the fields of the tuple
that are passed to the
exec()
method. Here there is a single field, and we construct an
anonymous
FieldSchema
(the name is passed as
null
, since Pig ignores the name
when doing type conversion). The type is specified using the
INTEGER
constant on Pig's
DataType
class.
With the amended function, Pig will attempt to convert the argument passed to the func-
tion to an integer. If the field cannot be converted, then a
null
is passed for the field. The
exec()
method always returns
false
when the field is
null
. For this application, this
behavior is appropriate, as we want to filter out records whose quality field is unintelli-
gible.
An Eval UDF
Writing an eval function is a small step up from writing a filter function. Consider the
UDF in
Example 16-2
,
which trims the leading and trailing whitespace from
chararray
Example 16-2. An EvalFunc UDF to trim leading and trailing whitespace from chararray
values