Database Reference
In-Depth Information
select f1, f2, GetMaxInt(f1,f2) from TestData;
Figure 9.24 shows the resulting output.
Figure 9.24 Sample UDF output
Once you are comfortable creating custom UDFs for Hive, you can
investigate creating UDAFs and UDTFs. You can find more information
about creating custom functions on the Apache Hive Wiki
( https://cwiki.apache.org/confluence/display/Hive/Home ) .
Summary
In this chapter, you saw how Pig and Hive are used to apply data processing
on top of Hadoop. You can use these tools to perform complex extracting,
transformation, and loading (ETL) of your big data. Both of these tools
process the data using functions created in Java code. Some of these
functions are part of the native functionality of the toolset and are ready to
use right out of the box. Some of the functions you can find through the
open source community. You can download and install the jar files for these
libraries and use them to extend the native functionality of your scripts. If
you are comfortable programming in Java, you can even create your own
functions to extend the functionality to meet your own unique processing
requirements.
One takeaway from this chapter should be an appreciation of how
extendable Pig Latin and HiveQL are at using pluggable interfaces. This
chapter might even jump-start you to investigate further the process of
creating and maybe even sharing your own custom function libraries.
 
 
Search WWH ::




Custom Search