Database Reference
In-Depth Information
Example 2-1. Python line count
>>> lines = sc . textFile ( "README.md" ) # Create an RDD called lines
>>> lines . count () # Count the number of items in this RDD
127
>>> lines . first () # First item in this RDD, i.e. first line of README.md
u'# Apache Spark'
Example 2-2. Scala line count
scala > val lines = sc . textFile ( "README.md" ) // Create an RDD called lines
lines : spark.RDD [ String ] = MappedRDD [ ... ]
scala > lines . count () // Count the number of items in this RDD
res0 : Long = 127
scala > lines . first () // First item in this RDD, i.e. first line of README.md
res1 : String = # Apache Spark
To exit either shell, press Ctrl-D.
We will discuss it more in Chapter 7 , but one of the messages you
may have noticed is INFO SparkUI: Started SparkUI at
http://[ipaddress]:4040 . You can access the Spark UI there and
see all sorts of information about your tasks and cluster.
In Examples 2-1 and 2-2 , the variable called lines is an RDD, created here from a
text file on our local machine. We can run various parallel operations on the RDD,
such as counting the number of elements in the dataset (here, lines of text in the file)
or printing the first one. We will discuss RDDs in great depth in later chapters, but
before we go any further, let's take a moment now to introduce basic Spark concepts.
Introduction to Core Spark Concepts
Now that you have run your first Spark code using the shell, it's time to learn about
programming in it in more detail.
At a high level, every Spark application consists of a driver program that launches
various parallel operations on a cluster. The driver program contains your applica‐
tion's main function and defines distributed datasets on the cluster, then applies oper‐
ations to them. In the preceding examples, the driver program was the Spark shell
itself, and you could just type in the operations you wanted to run.
Driver programs access Spark through a SparkContext object, which represents a
connection to a computing cluster. In the shell, a SparkContext is automatically
Search WWH ::




Custom Search