Spark SQL - Learning Spark

Database Reference

In-Depth Information

Example 9-36. Python string length UDF

# Make a UDF to tell us how long some text is

hiveCtx . registerFunction ( "strLenPython" , lambda x : len ( x ), IntegerType ())

lengthSchemaRDD = hiveCtx . sql ( "SELECT strLenPython('text') FROM tweets LIMIT 10" )

Example 9-37. Scala string length UDF

registerFunction ( "strLenScala" , ( _: String ). length )

val tweetLength = hiveCtx . sql ( "SELECT strLenScala('tweet') FROM tweets LIMIT 10" )

There are some additional imports for Java to define UDFs. As with the functions we

defined for RDDs we extend a special class. Depending on the number of parameters

we extend UDF[N] , as shown in Examples 9-38 and 9-39 .

Example 9-38. Java UDF imports

// Import UDF function class and DataTypes

// Note: these import paths may change in a future release

import org.apache.spark.sql.api.java.UDF1 ;

import org.apache.spark.sql.types.DataTypes ;

Example 9-39. Java string length UDF

hiveCtx . udf (). register ( "stringLengthJava" , new UDF1 < String , Integer >() {

@Override

public Integer call ( String str ) throws Exception {

return str . length ();

}

}, DataTypes . IntegerType );

SchemaRDD tweetLength = hiveCtx . sql (

"SELECT stringLengthJava('text') FROM tweets LIMIT 10" );

List < Row > lengths = tweetLength . collect ();

for ( Row row : result ) {

System . out . println ( row . get ( 0 ));

}

Hive UDFs

Spark SQL can also use existing Hive UDFs. The standard Hive UDFs are already

automatically included. If you have a custom UDF, it is important to make sure that

the JARs for your UDF are included with your application. If we run the JDBC server,

note that we can add this with the --jars command-line flag. Developing Hive UDFs

is beyond the scope of this topic, so we will instead introduce how to use existing

Hive UDFs.

Search WWH ::

Custom Search

Home