Pig - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Leveraging types

The filter works when the quality field is declared to be of type int , but if the type in-

formation is absent, the UDF fails! This happens because the field is the default type,

bytearray , represented by the DataByteArray class. Because DataByteArray

is not an Integer , the cast fails.

The obvious way to fix this is to convert the field to an integer in the exec() method.

However, there is a better way, which is to tell Pig the types of the fields that the function

expects. The getArgToFuncMapping() method on EvalFunc is provided for pre-

cisely this reason. We can override it to tell Pig that the first field should be an integer:

@Override

public List < FuncSpec > getArgToFuncMapping () throws

FrontendException {

List < FuncSpec > funcSpecs = new ArrayList < FuncSpec >();

funcSpecs . add ( new FuncSpec ( this . getClass (). getName (),

new Schema ( new Schema . FieldSchema ( null , DataType . INTEGER ))));

return funcSpecs ;

}

This method returns a FuncSpec object corresponding to each of the fields of the tuple

that are passed to the exec() method. Here there is a single field, and we construct an

anonymous FieldSchema (the name is passed as null , since Pig ignores the name

when doing type conversion). The type is specified using the INTEGER constant on Pig's

DataType class.

With the amended function, Pig will attempt to convert the argument passed to the func-

tion to an integer. If the field cannot be converted, then a null is passed for the field. The

exec() method always returns false when the field is null . For this application, this

behavior is appropriate, as we want to filter out records whose quality field is unintelli-

gible.

An Eval UDF

Writing an eval function is a small step up from writing a filter function. Consider the

UDF in Example 16-2 , which trims the leading and trailing whitespace from chararray

values using the trim() method on java.lang.String . [ 101 ]

Example 16-2. An EvalFunc UDF to trim leading and trailing whitespace from chararray

values

Search WWH ::

Custom Search

Home