Database Reference
In-Depth Information
Example 2-8. Initializing Spark in Scala
import
org.apache.spark.SparkConf
import
org.apache.spark.SparkContext
import
org.apache.spark.SparkContext._
val
conf
=
new
SparkConf
().
setMaster
(
"local"
).
setAppName
(
"My App"
)
val
sc
=
new
SparkContext
(
conf
)
Example 2-9. Initializing Spark in Java
import
org.apache.spark.SparkConf
;
import
org.apache.spark.api.java.JavaSparkContext
;
SparkConf
conf
=
new
SparkConf
().
setMaster
(
"local"
).
setAppName
(
"My App"
);
JavaSparkContext
sc
=
new
JavaSparkContext
(
conf
);
These examples show the minimal way to initialize a SparkContext, where you pass
two parameters:
• A
cluster URL
, namely
local
in these examples, which tells Spark how to connect
to a cluster.
local
is a special value that runs Spark on one thread on the local
machine, without connecting to a cluster.
• An
application name
, namely
My App
in these examples. This will identify your
application on the cluster manager's UI if you connect to a cluster.
Additional parameters exist for configuring how your application executes or adding
code to be shipped to the cluster, but we will cover these in later chapters of the topic.
After you have initialized a SparkContext, you can use all the methods we showed
before to create RDDs (e.g., from a text file) and manipulate them.
Finally, to shut down Spark, you can either call the
stop()
method on your Spark‐
Context, or simply exit the application (e.g., with
System.exit(0)
or
sys.exit()
).
This quick overview should be enough to let you run a standalone Spark application
on your laptop. For more advanced configuration,
Chapter 7
will cover how to con‐
nect your application to a cluster, including packaging your application so that its
code is automatically shipped to worker nodes. For now, please refer to the
Quick
Start Guide
in the official Spark documentation.
Building Standalone Applications
This wouldn't be a complete introductory chapter of a Big Data topic if we didn't
have a word count example. On a single machine, implementing word count is sim‐
ple, but in distributed frameworks it is a common example because it involves read‐
ing and combining data from many worker nodes. We will look at building and