Downloading Spark and Getting Started - Learning Spark

Database Reference

In-Depth Information

>>> pythonLines . first ()

u'## Interactive Python Shell'

Example 2-5. Scala filtering example

scala > val lines = sc . textFile ( "README.md" ) // Create an RDD called lines

lines : spark.RDD [ String ] = MappedRDD [ ... ]

scala > val pythonLines = lines . filter ( line => line . contains ( "Python" ))

pythonLines : spark.RDD [ String ] = FilteredRDD [ ... ]

scala > pythonLines . first ()

res0 : String = # # Interactive Python Shell

Passing Functions to Spark

If you are unfamiliar with the lambda or => syntax in Examples 2-4 and 2-5 , it is a

shorthand way to define functions inline in Python and Scala. When using Spark in

these languages, you can also define a function separately and then pass its name to

Spark. For example, in Python:

def hasPython ( line ):

return "Python" in line

pythonLines = lines . filter ( hasPython )

Passing functions to Spark is also possible in Java, but in this case they are defined as

classes, implementing an interface called Function . For example:

JavaRDD < String > pythonLines = lines . filter (

new Function < String , Boolean >() {

Boolean call ( String line ) { return line . contains ( "Python" ); }

}

);

Java 8 introduces shorthand syntax called lambdas that looks similar to Python and

Scala. Here is how the code would look with this syntax:

JavaRDD < String > pythonLines = lines . filter ( line -> line . contains ( "Python" ));

We discuss passing functions further in “Passing Functions to Spark” on page 30 .

While we will cover the Spark API in more detail later, a lot of its magic is that

function-based operations like filter also parallelize across the cluster. That is,

Spark automatically takes your function (e.g., line.contains("Python") ) and ships

it to executor nodes. Thus, you can write code in a single driver program and auto‐

matically have parts of it run on multiple nodes. Chapter 3 covers the RDD API in

detail.

Search WWH ::

Custom Search

Home