Getting Up and Running with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

>tar xfvz spark-1.2.0-bin-hadoop2.4.tgz

>cd spark-1.2.0-bin-hadoop2.4

Spark places user scripts to run Spark in the bin directory. You can test whether

everything is working correctly by running one of the example programs included in

Spark:

>./bin/run-example org.apache.spark.examples.SparkPi

This will run the example in Spark's local standalone mode. In this mode, all the Spark

processes are run within the same JVM, and Spark uses multiple threads for parallel pro-

cessing. By default, the preceding example uses a number of threads equal to the number

of cores available on your system. Once the program is finished running, you should see

something similar to the following lines near the end of the output:

…

14/11/27 20:58:47 INFO SparkContext: Job finished: reduce

at SparkPi.scala:35, took 0.723269 s

Pi is roughly 3.1465

…

To configure the level of parallelism in the local mode, you can pass in a master para-

meter of the local[N] form, where N is the number of threads to use. For example, to

use only two threads, run the following command instead:

>MASTER=local[2] ./bin/run-example

org.apache.spark.examples.SparkPi

Search WWH ::

Custom Search

Home