Database Reference
In-Depth Information
An evaluator must implement five methods, described in turn here (the flow is illustrated
in
Figure 17-3
):
init()
The
init()
method initializes the evaluator and resets its internal state. In
Maxim-
umIntUDAFEvaluator
, we set the
IntWritable
object holding the final result
to
null
. We use
null
to indicate that no values have been aggregated yet, which has
the desirable effect of making the maximum value of an empty set
NULL
.
iterate()
The
iterate()
method is called every time there is a new value to be aggregated.
The evaluator should update its internal state with the result of performing the aggrega-
tion. The arguments that
iterate()
takes correspond to those in the Hive function
from which it was called. In this example, there is only one argument. The value is first
checked to see whether it is
null
, and if it is, it is ignored. Otherwise, the
result
instance variable is set either to
value
's integer value (if this is the first value that has
been seen) or to the larger of the current result and
value
(if one or more values have
already been seen). We return
true
to indicate that the input value was valid.
terminatePartial()
The
terminatePartial()
method is called when Hive wants a result for the par-
tial aggregation. The method must return an object that encapsulates the state of the ag-
gregation. In this case, an
IntWritable
suffices because it encapsulates either the
maximum value seen or
null
if no values have been processed.
merge()
The
merge()
method is called when Hive decides to combine one partial aggregation
with another. The method takes a single object, whose type must correspond to the re-
turn type of the
terminatePartial()
method. In this example, the
merge()
method can simply delegate to the
iterate()
method because the partial aggrega-
tion is represented in the same way as a value being aggregated. This is not generally
the case (we'll see a more general example later), and the method should implement
the logic to combine the evaluator's state with the state of the partial aggregation.
terminate()
The
terminate()
method is called when the final result of the aggregation is
needed. The evaluator should return its state as a value. In this case, we return the
result
instance variable.