Database Reference
In-Depth Information
User-Defined Functions
Sometimes the query you want to write can't be expressed easily (or at all) using the built-
in functions that Hive provides. By allowing you to write a user-defined function (UDF),
Hive makes it easy to plug in your own processing code and invoke it from a Hive query.
UDFs have to be written in Java, the language that Hive itself is written in. For other lan-
guages, consider using a SELECT TRANSFORM query, which allows you to stream data
through a user-defined script ( MapReduce Scripts ).
There are three types of UDF in Hive: (regular) UDFs, user-defined aggregate functions
(UDAFs), and user-defined table-generating functions (UDTFs). They differ in the number
of rows that they accept as input and produce as output:
▪ A UDF operates on a single row and produces a single row as its output. Most
functions, such as mathematical functions and string functions, are of this type.
▪ A UDAF works on multiple input rows and creates a single output row. Aggregate
functions include such functions as COUNT and MAX .
▪ A UDTF operates on a single row and produces multiple rows — a table — as out-
put.
Table-generating functions are less well known than the other two types, so let's look at an
example. Consider a table with a single column, x , which contains arrays of strings. It's in-
structive to take a slight detour to see how the table is defined and populated:
CREATE TABLE arrays (x ARRAY<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002';
Notice that the ROW FORMAT clause specifies that the entries in the array are delimited by
Ctrl-B characters. The example file that we are going to load has the following contents,
where ^B is a representation of the Ctrl-B character to make it suitable for printing:
a^Bb
c^Bd^Be
After running a LOAD DATA command, the following query confirms that the data was
loaded correctly:
hive> SELECT * FROM arrays;
["a","b"]
["c","d","e"]
Search WWH ::




Custom Search