Data Research and Advanced Data Cleansing with Pig and Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

select f1, f2, GetMaxInt(f1,f2) from TestData;

Figure 9.24 shows the resulting output.

Figure 9.24 Sample UDF output

Once you are comfortable creating custom UDFs for Hive, you can

investigate creating UDAFs and UDTFs. You can find more information

about creating custom functions on the Apache Hive Wiki

Summary

In this chapter, you saw how Pig and Hive are used to apply data processing

on top of Hadoop. You can use these tools to perform complex extracting,

transformation, and loading (ETL) of your big data. Both of these tools

process the data using functions created in Java code. Some of these

functions are part of the native functionality of the toolset and are ready to

use right out of the box. Some of the functions you can find through the

open source community. You can download and install the jar files for these

libraries and use them to extend the native functionality of your scripts. If

you are comfortable programming in Java, you can even create your own

functions to extend the functionality to meet your own unique processing

requirements.

One takeaway from this chapter should be an appreciation of how

extendable Pig Latin and HiveQL are at using pluggable interfaces. This

chapter might even jump-start you to investigate further the process of

creating and maybe even sharing your own custom function libraries.

Search WWH ::

Custom Search

Home