Database Reference
In-Depth Information
Example 6-19. Removing outliers in Python
# Convert our RDD of strings to numeric data so we can compute stats and
# remove the outliers.
distanceNumerics = distances . map ( lambda string : float ( string ))
stats = distanceNumerics . stats ()
stddev = std . stdev ()
mean = stats . mean ()
reasonableDistances = distanceNumerics . filter (
lambda x : math . fabs ( x - mean ) < 3 * stddev )
print reasonableDistances . collect ()
Example 6-20. Removing outliers in Scala
// Now we can go ahead and remove outliers since those may have misreported locations
// first we need to take our RDD of strings and turn it into doubles.
val distanceDouble = distance . map ( string => string . toDouble )
val stats = distanceDoubles . stats ()
val stddev = stats . stdev
val mean = stats . mean
val reasonableDistances = distanceDoubles . filter ( x => math . abs ( x - mean ) < 3 * stddev )
println ( reasonableDistance . collect (). toList )
Example 6-21. Removing outliers in Java
// First we need to convert our RDD of String to a DoubleRDD so we can
// access the stats function
JavaDoubleRDD distanceDoubles = distances . mapToDouble ( new DoubleFunction < String >() {
public double call ( String value ) {
return Double . parseDouble ( value );
}});
final StatCounter stats = distanceDoubles . stats ();
final Double stddev = stats . stdev ();
final Double mean = stats . mean ();
JavaDoubleRDD reasonableDistances =
distanceDoubles . filter ( new Function < Double , Boolean >() {
public Boolean call ( Double x ) {
return ( Math . abs ( x - mean ) < 3 * stddev );}});
System . out . println ( StringUtils . join ( reasonableDistance . collect (), "," ));
With that final piece we have completed our sample application, which uses accumu‐
lators and broadcast variables, per-partition processing, interfaces with external pro‐
grams, and summary statistics. The entire source code is available in src/python/
ChapterSixExample.py , src/main/scala/com/oreilly/learningsparkexamples/scala/
ChapterSixExample.scala , and src/main/java/com/oreilly/learningsparkexamples/java/
ChapterSixExample.java , respectively.
Search WWH ::




Custom Search