Downloading Spark and Getting Started - Learning Spark

Database Reference

In-Depth Information

Example 2-8. Initializing Spark in Scala

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

val conf = new SparkConf (). setMaster ( "local" ). setAppName ( "My App" )

val sc = new SparkContext ( conf )

Example 2-9. Initializing Spark in Java

import org.apache.spark.SparkConf ;

import org.apache.spark.api.java.JavaSparkContext ;

SparkConf conf = new SparkConf (). setMaster ( "local" ). setAppName ( "My App" );

JavaSparkContext sc = new JavaSparkContext ( conf );

These examples show the minimal way to initialize a SparkContext, where you pass

two parameters:

• A cluster URL , namely local in these examples, which tells Spark how to connect

to a cluster. local is a special value that runs Spark on one thread on the local

machine, without connecting to a cluster.

• An application name , namely My App in these examples. This will identify your

application on the cluster manager's UI if you connect to a cluster.

Additional parameters exist for configuring how your application executes or adding

code to be shipped to the cluster, but we will cover these in later chapters of the topic.

After you have initialized a SparkContext, you can use all the methods we showed

before to create RDDs (e.g., from a text file) and manipulate them.

Finally, to shut down Spark, you can either call the stop() method on your Spark‐

Context, or simply exit the application (e.g., with System.exit(0) or sys.exit() ).

This quick overview should be enough to let you run a standalone Spark application

on your laptop. For more advanced configuration, Chapter 7 will cover how to con‐

nect your application to a cluster, including packaging your application so that its

code is automatically shipped to worker nodes. For now, please refer to the Quick

Start Guide in the official Spark documentation.

Building Standalone Applications

This wouldn't be a complete introductory chapter of a Big Data topic if we didn't

have a word count example. On a single machine, implementing word count is sim‐

ple, but in distributed frameworks it is a common example because it involves read‐

ing and combining data from many worker nodes. We will look at building and

Search WWH ::

Custom Search

Home