Database Reference
In-Depth Information
Add a reference to the file and the TRANSFORM statement to call the script:
add file C:\SampleData\get_maxValue.py;
SELECT
TRANSFORM(s.recdate,s.sensor,s.v1,s.v2,s.v3,s.v4)
USING 'python get_maxValue.py'
AS (recdate,sensor,maxvalue) FROM speeds s;
The data output should look similar to Figure 9.23 .
Figure 9.23 Output of the get_maxValue.py script
Creating Your Own UDFs for Hive
As mentioned previously, Hive contains a number of function types
depending on the processing involved. The simplest type is the UDF, which
takes a row in, processes it, and returns the row back. The UDAF is a
little more involved because it performs an aggregation on input values and
reduces the number of rows coming out. The other type of function you can
create is the UDTF, which takes a row in and parses it out into a table.
If you followed along in the earlier section on building custom UDFs for
Pig, you will find that building UDFs for Hive is a similar experience. First,
you create a project in your favorite Java development environment. Then,
you add a reference to the hive-exec.jar and the hive-serde.jar
files. These are located in the hive folder in the lib subfolder. After you
add these references, you add an import statement to the
org.apache.hadoop.hive.ql.exec.UDF class and extend it with a
custom class:
 
 
Search WWH ::




Custom Search