Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

User-Defined Functions

Sometimes the query you want to write can't be expressed easily (or at all) using the built-

in functions that Hive provides. By allowing you to write a user-defined function (UDF),

Hive makes it easy to plug in your own processing code and invoke it from a Hive query.

UDFs have to be written in Java, the language that Hive itself is written in. For other lan-

guages, consider using a SELECT TRANSFORM query, which allows you to stream data

through a user-defined script ( MapReduce Scripts ).

There are three types of UDF in Hive: (regular) UDFs, user-defined aggregate functions

(UDAFs), and user-defined table-generating functions (UDTFs). They differ in the number

of rows that they accept as input and produce as output:

▪ A UDF operates on a single row and produces a single row as its output. Most

functions, such as mathematical functions and string functions, are of this type.

▪ A UDAF works on multiple input rows and creates a single output row. Aggregate

functions include such functions as COUNT and MAX .

▪ A UDTF operates on a single row and produces multiple rows — a table — as out-

put.

Table-generating functions are less well known than the other two types, so let's look at an

example. Consider a table with a single column, x , which contains arrays of strings. It's in-

structive to take a slight detour to see how the table is defined and populated:

CREATE TABLE arrays (x ARRAY<STRING>)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002';

Notice that the ROW FORMAT clause specifies that the entries in the array are delimited by

Ctrl-B characters. The example file that we are going to load has the following contents,

where ^B is a representation of the Ctrl-B character to make it suitable for printing:

a^Bb

c^Bd^Be

After running a LOAD DATA command, the following query confirms that the data was

loaded correctly:

hive> SELECT * FROM arrays;

["a","b"]

["c","d","e"]

Search WWH ::

Custom Search

Home