Database Reference
In-Depth Information
Example 9-36. Python string length UDF
# Make a UDF to tell us how long some text is
hiveCtx . registerFunction ( "strLenPython" , lambda x : len ( x ), IntegerType ())
lengthSchemaRDD = hiveCtx . sql ( "SELECT strLenPython('text') FROM tweets LIMIT 10" )
Example 9-37. Scala string length UDF
registerFunction ( "strLenScala" , ( _: String ). length )
val tweetLength = hiveCtx . sql ( "SELECT strLenScala('tweet') FROM tweets LIMIT 10" )
There are some additional imports for Java to define UDFs. As with the functions we
defined for RDDs we extend a special class. Depending on the number of parameters
we extend UDF[N] , as shown in Examples 9-38 and 9-39 .
Example 9-38. Java UDF imports
// Import UDF function class and DataTypes
// Note: these import paths may change in a future release
import org.apache.spark.sql.api.java.UDF1 ;
import org.apache.spark.sql.types.DataTypes ;
Example 9-39. Java string length UDF
hiveCtx . udf (). register ( "stringLengthJava" , new UDF1 < String , Integer >() {
@Override
public Integer call ( String str ) throws Exception {
return str . length ();
}
}, DataTypes . IntegerType );
SchemaRDD tweetLength = hiveCtx . sql (
"SELECT stringLengthJava('text') FROM tweets LIMIT 10" );
List < Row > lengths = tweetLength . collect ();
for ( Row row : result ) {
System . out . println ( row . get ( 0 ));
}
Hive UDFs
Spark SQL can also use existing Hive UDFs. The standard Hive UDFs are already
automatically included. If you have a custom UDF, it is important to make sure that
the JARs for your UDF are included with your application. If we run the JDBC server,
note that we can add this with the --jars command-line flag. Developing Hive UDFs
is beyond the scope of this topic, so we will instead introduce how to use existing
Hive UDFs.
Search WWH ::




Custom Search