Database Reference
In-Depth Information
An evaluator must implement five methods, described in turn here (the flow is illustrated
in Figure 17-3 ):
init()
The init() method initializes the evaluator and resets its internal state. In Maxim-
umIntUDAFEvaluator , we set the IntWritable object holding the final result
to null . We use null to indicate that no values have been aggregated yet, which has
the desirable effect of making the maximum value of an empty set NULL .
iterate()
The iterate() method is called every time there is a new value to be aggregated.
The evaluator should update its internal state with the result of performing the aggrega-
tion. The arguments that iterate() takes correspond to those in the Hive function
from which it was called. In this example, there is only one argument. The value is first
checked to see whether it is null , and if it is, it is ignored. Otherwise, the result
instance variable is set either to value 's integer value (if this is the first value that has
been seen) or to the larger of the current result and value (if one or more values have
already been seen). We return true to indicate that the input value was valid.
terminatePartial()
The terminatePartial() method is called when Hive wants a result for the par-
tial aggregation. The method must return an object that encapsulates the state of the ag-
gregation. In this case, an IntWritable suffices because it encapsulates either the
maximum value seen or null if no values have been processed.
merge()
The merge() method is called when Hive decides to combine one partial aggregation
with another. The method takes a single object, whose type must correspond to the re-
turn type of the terminatePartial() method. In this example, the merge()
method can simply delegate to the iterate() method because the partial aggrega-
tion is represented in the same way as a value being aggregated. This is not generally
the case (we'll see a more general example later), and the method should implement
the logic to combine the evaluator's state with the state of the partial aggregation.
terminate()
The terminate() method is called when the final result of the aggregation is
needed. The evaluator should return its state as a value. In this case, we return the
result instance variable.
Search WWH ::




Custom Search