Analytics with Hadoop - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

24 Date parsedate = incommingDateFormat.parse( s.toString() );

25

26 to_value.set( convertedDateFormat.format(parsedate) );

27

28 }

29 catch (Exception e)

30 {

31 to_value = new Text(s);

32 }

33 }

34 return to_value;

35 }

36 }

The package name is defined at line 1, while import statements to import Hive, Hadoop, and Java functionality

exist between lines 3 and 6.

1 package nz.co.semtech-solutions.hive.udf;

3 import org.apache.hadoop.hive.ql.exec.UDF;

4 import org.apache.hadoop.io.Text;

5 import java.text.SimpleDateFormat;

6 import java.util.Date;

The class DateConv that is the UDF function name is defined at line 8; it extends an existing class UDF .

8 class DateConv extends UDF

At line 11, the public class evaulate is defined, which takes a Text parameter and returns a Text value:

11 public Text evaluate(Text s)

Finally, the main functionality of the UDF occurs between lines 21 and 26 in the try/catch section of the code. The

input date string is converted from the format dd-MM-yyyy to the format yyyy-MM-dd. (This is a somewhat contrived

example that takes only a single date format, but it gives an idea of what can be achieved with Hive UDFs.)

21 SimpleDateFormat incommingDateFormat = new

SimpleDateFormat("dd/MM/yyyy");

22 SimpleDateFormat convertedDateFormat = new

SimpleDateFormat("yyyy-MM-dd");

23

24 Date parsedate = incommingDateFormat.parse( s.toString() );

25

26 to_value.set( convertedDateFormat.format(parsedate) );

Having created the Java file that will form the new UDF function, I move back to the top of the directory structure

by using the Linux cd command and invoke the sbt command to compile the code:

[hadoop@hc2nn udf]$ cd /home/hadoop/hive/udf/

[hadoop@hc2nn udf]$ sbt

[info] Set current project to DateConv (in build file:/home/hadoop/hive/udf/)

Search WWH ::

Custom Search

Home