Database Reference
In-Depth Information
hive>
SELECT STRIP(' bee ') FROM dummy;
bee
If you want to remove the function, use the
DROP FUNCTION
statement:
DROP FUNCTION strip;
It's also possible to create a function for the duration of the Hive session, so it is not per-
sisted in the metastore, using the
TEMPORARY
keyword:
ADD JAR /path/to/hive-examples.jar;
CREATE TEMPORARY FUNCTION strip AS 'com.hadoopbook.hive.Strip';
When using temporary functions, it may be useful to create a
.hiverc
file in your home
directory containing the commands to define your UDFs. The file will be automatically
run at the beginning of each Hive session.
NOTE
As an alternative to calling
ADD JAR
at launch time, you can specify a path where Hive looks for auxil-
iary JAR files to put on its classpath (including the task classpath). This technique is useful for automat-
ically adding your own library of UDFs every time you run Hive.
There are two ways of specifying the path. Either pass the
--auxpath
option to the
hive
command:
%
hive --auxpath /path/to/hive-examples.jar
or set the
HIVE_AUX_JARS_PATH
environment variable before invoking Hive. The auxiliary path may
be a comma-separated list of JAR file paths or a directory containing JAR files.
Writing a UDAF
An aggregate function is more difficult to write than a regular UDF. Values are aggregated
in chunks (potentially across many tasks), so the implementation has to be capable of
combining partial aggregations into a final result. The code to achieve this is best ex-
plained by example, so let's look at the implementation of a simple UDAF for calculating
the maximum of a collection of integers (
Example 17-3
)
.
Example 17-3. A UDAF for calculating the maximum of a collection of integers
package
com
.
hadoopbook
.
hive
;
import
org.apache.hadoop.hive.ql.exec.UDAF
;
import
org.apache.hadoop.hive.ql.exec.UDAFEvaluator
;
import
org.apache.hadoop.io.IntWritable
;
public class
Maximum
extends
UDAF
{