Database Reference
In-Depth Information
Example 2-8. Initializing Spark in Scala
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
val conf = new SparkConf (). setMaster ( "local" ). setAppName ( "My App" )
val sc = new SparkContext ( conf )
Example 2-9. Initializing Spark in Java
import org.apache.spark.SparkConf ;
import org.apache.spark.api.java.JavaSparkContext ;
SparkConf conf = new SparkConf (). setMaster ( "local" ). setAppName ( "My App" );
JavaSparkContext sc = new JavaSparkContext ( conf );
These examples show the minimal way to initialize a SparkContext, where you pass
two parameters:
• A cluster URL , namely local in these examples, which tells Spark how to connect
to a cluster. local is a special value that runs Spark on one thread on the local
machine, without connecting to a cluster.
• An application name , namely My App in these examples. This will identify your
application on the cluster manager's UI if you connect to a cluster.
Additional parameters exist for configuring how your application executes or adding
code to be shipped to the cluster, but we will cover these in later chapters of the topic.
After you have initialized a SparkContext, you can use all the methods we showed
before to create RDDs (e.g., from a text file) and manipulate them.
Finally, to shut down Spark, you can either call the stop() method on your Spark‐
Context, or simply exit the application (e.g., with System.exit(0) or sys.exit() ).
This quick overview should be enough to let you run a standalone Spark application
on your laptop. For more advanced configuration, Chapter 7 will cover how to con‐
nect your application to a cluster, including packaging your application so that its
code is automatically shipped to worker nodes. For now, please refer to the Quick
Start Guide in the official Spark documentation.
Building Standalone Applications
This wouldn't be a complete introductory chapter of a Big Data topic if we didn't
have a word count example. On a single machine, implementing word count is sim‐
ple, but in distributed frameworks it is a common example because it involves read‐
ing and combining data from many worker nodes. We will look at building and
Search WWH ::




Custom Search