Database Reference
In-Depth Information
Add a reference to the file and the
TRANSFORM
statement to call the script:
add file C:\SampleData\get_maxValue.py;
SELECT
TRANSFORM(s.recdate,s.sensor,s.v1,s.v2,s.v3,s.v4)
USING 'python get_maxValue.py'
AS (recdate,sensor,maxvalue) FROM speeds s;
The data output should look similar to
Figure 9.23
.
Creating Your Own UDFs for Hive
As mentioned previously, Hive contains a number of function types
depending on the processing involved. The simplest type is the UDF, which
takes a row in, processes it, and returns the row back. The UDAF is a
little more involved because it performs an aggregation on input values and
reduces the number of rows coming out. The other type of function you can
create is the UDTF, which takes a row in and parses it out into a table.
If you followed along in the earlier section on building custom UDFs for
Pig, you will find that building UDFs for Hive is a similar experience. First,
you create a project in your favorite Java development environment. Then,
you add a reference to the
hive-exec.jar
and the
hive-serde.jar
files. These are located in the
hive
folder in the
lib
subfolder. After you
add these references, you add an
import
statement to the
org.apache.hadoop.hive.ql.exec.UDF
class and extend it with a
custom class: