Database Reference
In-Depth Information
One issue to watch out for when passing functions is inadvertently serializing the
object containing the function. When you pass a function that is the member of an
object, or contains references to fields in an object (e.g., self.field ), Spark sends the
entire object to worker nodes, which can be much larger than the bit of information
you need (see Example 3-19 ). Sometimes this can also cause your program to fail, if
your class contains objects that Python can't figure out how to pickle.
Example 3-19. Passing a function with field references (don't do this!)
class SearchFunctions ( object ):
def __init__ ( self , query ):
self . query = query
def isMatch ( self , s ):
return self . query in s
def getMatchesFunctionReference ( self , rdd ):
# Problem: references all of "self" in "self.isMatch"
return rdd . filter ( self . isMatch )
def getMatchesMemberReference ( self , rdd ):
# Problem: references all of "self" in "self.query"
return rdd . filter ( lambda x : self . query in x )
Instead, just extract the fields you need from your object into a local variable and pass
that in, like we do in Example 3-20 .
Example 3-20. Python function passing without field references
class WordFunctions ( object ):
...
def getMatchesNoReference ( self , rdd ):
# Safe: extract only the field we need into a local variable
query = self . query
return rdd . filter ( lambda x : query in x )
Scala
In Scala, we can pass in functions defined inline, references to methods, or static
functions as we do for Scala's other functional APIs. Some other considerations come
into play, though—namely that the function we pass and the data referenced in it
needs to be serializable (implementing Java's Serializable interface). Furthermore, as
in Python, passing a method or field of an object includes a reference to that whole
object, though this is less obvious because we are not forced to write these references
with self . As we did with Python in Example 3-20 , we can instead extract the fields
we need as local variables and avoid needing to pass the whole object containing
them, as shown in Example 3-21 .
Search WWH ::




Custom Search