Database Reference
In-Depth Information
Example 3-21. Scala function passing
class SearchFunctions ( val query : String ) {
def isMatch ( s : String ) : Boolean = {
s . contains ( query )
}
def getMatchesFunctionReference ( rdd : RDD [ String ]) : RDD [ String ] = {
// Problem: "isMatch" means "this.isMatch", so we pass all of "this"
rdd . map ( isMatch )
}
def getMatchesFieldReference ( rdd : RDD [ String ]) : RDD [ String ] = {
// Problem: "query" means "this.query", so we pass all of "this"
rdd . map ( x => x . split ( query ))
}
def getMatchesNoReference ( rdd : RDD [ String ]) : RDD [ String ] = {
// Safe: extract just the field we need into a local variable
val query_ = this . query
rdd . map ( x => x . split ( query_ ))
}
}
If NotSerializableException occurs in Scala, a reference to a method or field in a
nonserializable class is usually the problem. Note that passing in local serializable
variables or functions that are members of a top-level object is always safe.
Java
In Java, functions are specified as objects that implement one of Spark's function
interfaces from the org.apache.spark.api.java.function package. There are a
number of different interfaces based on the return type of the function. We show the
most basic function interfaces in Table 3-1 , and cover a number of other function
interfaces for when we need to return special types of data, like key/value data, in
“Java” on page 43 .
Table 3-1. Standard Java function interfaces
Function name
Method to implement
Usage
Take in one input and return one output, for use with
operations like map() and filter() .
Function<T, R>
R call(T)
Take in two inputs and return one output, for use with
operations like aggregate() or fold() .
Function2<T1, T2, R>
R call(T1, T2)
Take in one input and return zero or more outputs, for use with
operations like flatMap() .
FlatMapFunction<T,
R>
Iterable<R>
call(T)
 
Search WWH ::




Custom Search