Databases Reference
In-Depth Information
The object input belongs to the Tuple class, which has two methods for retrieving its
content.
List<Object> getAll();
Object get(int fieldNum) throws ExecException;
The getAll() method
return all fields in the tuple as an ordered list. UPPER instead
uses the get() method
to request for a specific field (at position 0). This method
would throw an ExecException if the requested field number is greater than the num-
ber of fields in the tuple. In UPPER the retrieved field is casted to a Java String, which
usually works but may cause a cast exception if we were casting
between incompatible
data types. We'll see later how to use Pig to ensure that our casting works. In any case,
the try/catch block would've caught and handled any exception. If everything works,
UPPER 's exec() method will return a String with characters uppercased. In addition,
most UDFs should implement the default behavior that the output is null when the
input tuple is null.
In addition to implementing exec() , UPPER also overrides a couple methods from
EvalFunc , one of which is getArgToFuncMapping :
@Override
public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(),
new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));
return funcList;
}
The getArgToFuncMapping() method returns a List of FuncSpec objects
repre-
senting the schema of each field in the input tuple. Pig will handle typecasting for
you by converting the types of all fields in a tuple to conform to this schema before
passing it to exec() . It will pass fields that can't be converted to the desired type
as null.
UPPER only cares about the type of the first field, so it adds only one FuncSpec to
the list, and this FuncSpec states that the field must be of type chararray , represented
as DataType.CHARARRAY . The instantiation of FuncSpec is quite convoluted, which
is due to Pig's ability to handle complex nested types. Fortunately, unless you work
with unusually complicated types, you'll probably find a FuncSpec instantiation for the
type you want already in one of PiggyBank's UDFs. Reuse that in your code. You can
even reuse the entire getArgToFuncMapping() function if you have the same tuple
schema as another UDF.
Besides telling Pig the input schema, you can also tell Pig the schema of your
output. You may not need to do this if the output of your UDF is a simple scalar, as
Pig will use Java's Reflection mechanism to infer the schema automatically. But if
your UDF returns a tuple or a bag, the Reflection mechanism
will fail to figure out
the schema completely. In that case you should specify it so that Pig can propagate
the schema correctly.
 
Search WWH ::




Custom Search