Database Reference
In-Depth Information
1.0)).reduceByKey(lambda a, b: a + b).collect()
mostPopular = sorted(products, key=lambda x: x[1],
reverse=True)[0]
print "Total purchases: %d" % numPurchases
print "Unique users: %d" % uniqueUsers
print "Total revenue: %2.2f" % totalRevenue
print "Most popular product: %s with %d purchases" %
(mostPopular[0], mostPopular[1])
If you compare the Scala and Python versions of our program, you will see that generally,
the syntax looks very similar. One key difference is how we express anonymous functions
(also called lambda functions; hence, the use of this keyword for the Python syntax). In
Scala, we've seen that an anonymous function mapping an input x to an output y is ex-
pressed as x => y , while in Python, it is lambda x: y . In the highlighted line in the
preceding code, we are applying an anonymous function that maps two inputs, a and b ,
generally of the same type, to an output. In this case, the function that we apply is the plus
function; hence, lambda a, b: a + b .
The best way to run the script is to run the following command from the base directory of
the sample project:
>$SPARK_HOME/bin/spark-submit pythonapp.py
Here, the SPARK_HOME variable should be replaced with the path of the directory in
which you originally unpacked the Spark prebuilt binary package at the start of this
chapter.
Upon running the script, you should see output similar to that of the Scala and Java ex-
amples, with the results of our computation being the same:
...
14/01/30 11:43:47 INFO SparkContext: Job finished: collect
at pythonapp.py:14, took 0.050251 s
Total purchases: 5
Unique users: 4
Total revenue: 39.91
Most popular product: iPhone Cover with 2 purchases
Search WWH ::




Custom Search