Database Reference
In-Depth Information
hive> SELECT STRIP(' bee ') FROM dummy;
bee
If you want to remove the function, use the DROP FUNCTION statement:
DROP FUNCTION strip;
It's also possible to create a function for the duration of the Hive session, so it is not per-
sisted in the metastore, using the TEMPORARY keyword:
ADD JAR /path/to/hive-examples.jar;
CREATE TEMPORARY FUNCTION strip AS 'com.hadoopbook.hive.Strip';
When using temporary functions, it may be useful to create a .hiverc file in your home
directory containing the commands to define your UDFs. The file will be automatically
run at the beginning of each Hive session.
NOTE
As an alternative to calling ADD JAR at launch time, you can specify a path where Hive looks for auxil-
iary JAR files to put on its classpath (including the task classpath). This technique is useful for automat-
ically adding your own library of UDFs every time you run Hive.
There are two ways of specifying the path. Either pass the --auxpath option to the hive command:
% hive --auxpath /path/to/hive-examples.jar
or set the HIVE_AUX_JARS_PATH environment variable before invoking Hive. The auxiliary path may
be a comma-separated list of JAR file paths or a directory containing JAR files.
Writing a UDAF
An aggregate function is more difficult to write than a regular UDF. Values are aggregated
in chunks (potentially across many tasks), so the implementation has to be capable of
combining partial aggregations into a final result. The code to achieve this is best ex-
plained by example, so let's look at the implementation of a simple UDAF for calculating
the maximum of a collection of integers ( Example 17-3 ) .
Example 17-3. A UDAF for calculating the maximum of a collection of integers
package com . hadoopbook . hive ;
import org.apache.hadoop.hive.ql.exec.UDAF ;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator ;
import org.apache.hadoop.io.IntWritable ;
public class Maximum extends UDAF {
Search WWH ::




Custom Search