Database Reference
In-Depth Information
29 return null;
30 }
31 /*--------------------------------------------------------*/
32
33 } /* class CleanWS */
Line 1 defines the package name to be wcudfs
01 package wcudfs;
Pig and Hadoop functionality for tuples and utilities is imported into the UDF between lines 5 and 7. Line 5
invokes the EvalFunc class and identifies this as an eval type of UDF function:
05 import org.apache.pig.EvalFunc;
06 import org.apache.pig.data.Tuple;
07 import org.apache.hadoop.util.*;
Line 9 specifies the class name as CleanWS, which extends the EvalFunc class and has a String return type.
09 public class CleanWS extends EvalFunc<String>
Line 13 onward defines the exec method that will be called to process every tuple in the data:
13 public String exec(Tuple input) throws IOException
Line 21 changes the return string, removing all characters that are not in the character sets A-Z, a-z, or 0-9 and
replacing them with a space character.
21 return str.replaceAll("[^A-Za-z0-9]"," ");
For example, I built CleanWS as follows:
[hadoop@hc1nn wcudfs]$ cat build_clean_ws.sh
javac -classpath $PIG_HOME/pig-0.12.1.jar -Xlint:deprecation CleanWS.java
The Java compiler is called javac; an option is added via the classpath to include the Pig library in the build. The
lint:deprecation option uses lint to check the code for deprecated API calls. These scripts build the class files:
[hadoop@hc1nn wcudfs]$ ./build_clean_ws.sh
[hadoop@hc1nn wcudfs]$ ls
build_clean_ws.sh CleanWS.class CleanWS.java
The class file is created as part of the build for the Java file. The class file for the UDF is built into a library that can
be used within a Pig script. The library is built using the jar command with the options c (create), v (verbose), and f
(file). The next parameter to be created is the library name, followed by the list of classes to be placed in the library:
[hadoop@hc1nn wcudfs]$ cd ..
[hadoop@hc1nn pig]$ jar cvf wcudfs.jar wcudfs/*.class
added manifest
adding: wcudfs/CleanWS.class(in = 1318) (out= 727)(deflated 44%)
[hadoop@hc1nn pig]$ ls -l wcudfs.jar
-rw-rw-r--. 1 hadoop hadoop 2018 Jun 24 18:57 wcudfs.jar
 
Search WWH ::




Custom Search