Database Reference
In-Depth Information
Example 3-15. Python error count using actions
print "Input had " + badLinesRDD . count () + " concerning lines"
print "Here are 10 examples:"
for line in badLinesRDD . take ( 10 ):
print line
Example 3-16. Scala error count using actions
println ( "Input had " + badLinesRDD . count () + " concerning lines" )
println ( "Here are 10 examples:" )
badLinesRDD . take ( 10 ). foreach ( println )
Example 3-17. Java error count using actions
System . out . println ( "Input had " + badLinesRDD . count () + " concerning lines" )
System . out . println ( "Here are 10 examples:" )
for ( String line: badLinesRDD . take ( 10 )) {
System . out . println ( line );
}
In this example, we used take() to retrieve a small number of elements in the RDD
at the driver program. We then iterate over them locally to print out information at
the driver. RDDs also have a collect() function to retrieve the entire RDD. This can
be useful if your program filters RDDs down to a very small size and you'd like to
deal with it locally. Keep in mind that your entire dataset must fit in memory on a
single machine to use collect() on it, so collect() shouldn't be used on large
datasets.
In most cases RDDs can't just be collect() ed to the driver because they are too
large. In these cases, it's common to write data out to a distributed storage system
such as HDFS or Amazon S3. You can save the contents of an RDD using the
saveAsTextFile() action, saveAsSequenceFile() , or any of a number of actions for
various built-in formats. We will cover the different options for exporting data in
Chapter 5 .
It is important to note that each time we call a new action, the entire RDD must be
computed “from scratch.” To avoid this inefficiency, users can persist intermediate
results, as we will cover in “Persistence (Caching)” on page 44 .
Lazy Evaluation
As you read earlier, transformations on RDDs are lazily evaluated, meaning that
Spark will not begin to execute until it sees an action. This can be somewhat counter‐
intuitive for new users, but may be familiar for those who have used functional lan‐
guages such as Haskell or LINQ-like data processing frameworks.
Search WWH ::




Custom Search