Database Reference
In-Depth Information
Standalone Applications
The final piece missing in this quick tour of Spark is how to use it in standalone pro‐
grams. Apart from running interactively, Spark can be linked into standalone appli‐
cations in either Java, Scala, or Python. The main difference from using it in the shell
is that you need to initialize your own SparkContext. After that, the API is the same.
The process of linking to Spark varies by language. In Java and Scala, you give your
application a Maven dependency on the spark-core artifact. As of the time of writ‐
ing, the latest Spark version is 1.2.0, and the Maven coordinates for that are:
groupId = org.apache.spark
artifactId = spark-core_2.10
version = 1.2.0
Maven is a popular package management tool for Java-based languages that lets you
link to libraries in public repositories. You can use Maven itself to build your project,
or use other tools that can talk to the Maven repositories, including Scala's sbt tool or
Gradle. Popular integrated development environments like Eclipse also allow you to
directly add a Maven dependency to a project.
In Python, you simply write applications as Python scripts, but you must run them
using the bin/spark-submit script included in Spark. The spark-submit script
includes the Spark dependencies for us in Python. This script sets up the environ‐
ment for Spark's Python API to function. Simply run your script with the line given
in Example 2-6 .
Example 2-6. Running a Python script
bin/spark-submit my_script.py
(Note that you will have to use backslashes instead of forward slashes on Windows.)
Initializing a SparkContext
Once you have linked an application to Spark, you need to import the Spark packages
in your program and create a SparkContext. You do so by first creating a SparkConf
object to configure your application, and then building a SparkContext for it. Exam‐
ples 2-7 through 2-9 demonstrate this in each supported language.
Example 2-7. Initializing Spark in Python
from pyspark import SparkConf , SparkContext
conf = SparkConf () . setMaster ( "local" ) . setAppName ( "My App" )
sc = SparkContext ( conf = conf )
Search WWH ::




Custom Search