Database Reference
In-Depth Information
29 return null;
30 }
31 /*--------------------------------------------------------*/
32
33 } /* class CleanWS */
Line 1 defines the package name to be wcudfs
01 package wcudfs;
Pig and Hadoop functionality for tuples and utilities is imported into the UDF between lines 5 and 7. Line 5
invokes the
EvalFunc
class and identifies this as an
eval
type of UDF function:
05 import org.apache.pig.EvalFunc;
06 import org.apache.pig.data.Tuple;
07 import org.apache.hadoop.util.*;
Line 9 specifies the class name as
CleanWS,
which extends the
EvalFunc
class and has a
String
return type.
09 public class CleanWS extends EvalFunc<String>
Line 13 onward defines the
exec
method that will be called to process every tuple in the data:
13 public String exec(Tuple input) throws IOException
Line 21 changes the return string, removing all characters that are not in the character sets A-Z, a-z, or 0-9 and
replacing them with a space character.
21 return str.replaceAll("[^A-Za-z0-9]"," ");
For example, I built CleanWS as follows:
[hadoop@hc1nn wcudfs]$ cat build_clean_ws.sh
javac -classpath $PIG_HOME/pig-0.12.1.jar -Xlint:deprecation CleanWS.java
The Java compiler is called javac; an option is added via the
classpath
to include the Pig library in the build. The
lint:deprecation
option uses lint to check the code for deprecated API calls. These scripts build the class files:
[hadoop@hc1nn wcudfs]$ ./build_clean_ws.sh
[hadoop@hc1nn wcudfs]$ ls
build_clean_ws.sh CleanWS.class CleanWS.java
The class file is created as part of the build for the Java file. The class file for the UDF is built into a library that can
be used within a Pig script. The library is built using the
jar
command with the options
c
(create),
v
(verbose), and
f
(file). The next parameter to be created is the library name, followed by the list of classes to be placed in the library:
[hadoop@hc1nn wcudfs]$ cd ..
[hadoop@hc1nn pig]$ jar cvf wcudfs.jar wcudfs/*.class
added manifest
adding: wcudfs/CleanWS.class(in = 1318) (out= 727)(deflated 44%)
[hadoop@hc1nn pig]$ ls -l wcudfs.jar
-rw-rw-r--. 1 hadoop hadoop 2018 Jun 24 18:57 wcudfs.jar
Search WWH ::
Custom Search