Database Reference
In-Depth Information
which creates a directory called output containing the partition files:
% cat output/part-*
(1950,22)
(1949,111)
The saveAsTextFile() method also triggers a Spark job. The main difference is that
no value is returned, and instead the RDD is computed and its partitions are written to
files in the output directory.
Spark Applications, Jobs, Stages, and Tasks
As we've seen in the example, like MapReduce, Spark has the concept of a job . A Spark
job is more general than a MapReduce job, though, since it is made up of an arbitrary dir-
ected acyclic graph (DAG) of stages , each of which is roughly equivalent to a map or re-
duce phase in MapReduce.
Stages are split into tasks by the Spark runtime and are run in parallel on partitions of an
RDD spread across the cluster — just like tasks in MapReduce.
A job always runs in the context of an application (represented by a SparkContext in-
stance) that serves to group RDDs and shared variables. An application can run more than
one job, in series or in parallel, and provides the mechanism for a job to access an RDD
that was cached by a previous job in the same application. (We will see how to cache
RDDs in Persistence . ) An interactive Spark session, such as a spark-shell session, is just
an instance of an application.
A Scala Standalone Application
After working with the Spark shell to refine a program, you may want to package it into a
self-contained application that can be run more than once. The Scala program in
Example 19-1 shows how to do this.
Example 19-1. Scala application to find the maximum temperature, using Spark
import org.apache.spark.SparkContext._
import org.apache.spark. { SparkConf , SparkContext }
object MaxTemperature {
def main ( args : Array [ String ]) {
val conf = new SparkConf (). setAppName ( "Max Temperature" )
val sc = new SparkContext ( conf )
sc . textFile ( args ( 0 ))
. map ( _ . split ( "\t" ))
. filter ( rec => ( rec ( 1 ) != "9999" && rec ( 2 ). matches ( "[01459]" )))
Search WWH ::




Custom Search