Database Reference
In-Depth Information
Example 3-21. Scala function passing
class
SearchFunctions
(
val
query
:
String
)
{
def
isMatch
(
s
:
String
)
:
Boolean
=
{
s
.
contains
(
query
)
}
def
getMatchesFunctionReference
(
rdd
:
RDD
[
String
])
:
RDD
[
String
]
=
{
// Problem: "isMatch" means "this.isMatch", so we pass all of "this"
rdd
.
map
(
isMatch
)
}
def
getMatchesFieldReference
(
rdd
:
RDD
[
String
])
:
RDD
[
String
]
=
{
// Problem: "query" means "this.query", so we pass all of "this"
rdd
.
map
(
x
=>
x
.
split
(
query
))
}
def
getMatchesNoReference
(
rdd
:
RDD
[
String
])
:
RDD
[
String
]
=
{
// Safe: extract just the field we need into a local variable
val
query_
=
this
.
query
rdd
.
map
(
x
=>
x
.
split
(
query_
))
}
}
If
NotSerializableException
occurs in Scala, a reference to a method or field in a
nonserializable class is usually the problem. Note that passing in local serializable
variables or functions that are members of a top-level object is always safe.
Java
In Java, functions are specified as objects that implement one of Spark's function
interfaces from the
org.apache.spark.api.java.function
package. There are a
number of different interfaces based on the return type of the function. We show the
most basic function interfaces in
Table 3-1
, and cover a number of
other function
interfaces
for when we need to return special types of data, like key/value data, in
“Java” on page 43
.
Table 3-1. Standard Java function interfaces
Function name
Method to implement
Usage
Take in one input and return one output, for use with
operations like
map()
and
filter()
.
Function<T, R>
R call(T)
Take in two inputs and return one output, for use with
operations like
aggregate()
or
fold()
.
Function2<T1, T2, R>
R call(T1, T2)
Take in one input and return zero or more outputs, for use with
operations like
flatMap()
.
FlatMapFunction<T,
R>
Iterable<R>
call(T)