Database Reference
In-Depth Information
Handling incorrectly formatted records can be a big problem, espe‐
cially with semistructured data like JSON. With small datasets it
can be acceptable to stop the world (i.e., fail the program) on mal‐
formed input, but often with large datasets malformed input is
simply a part of life. If you do choose to skip incorrectly formatted
data, you may wish to look at using accumulators to keep track of
the number of errors.
Saving JSON
Writing out JSON files is much simpler compared to loading it, because we don't
have to worry about incorrectly formatted data and we know the type of the data that
we are writing out. We can use the same libraries we used to convert our RDD of
strings into parsed JSON data and instead take our RDD of structured data and con‐
vert it into an RDD of strings, which we can then write out using Spark's text file API.
Let's say we were running a promotion for people who love pandas. We can take our
input from the first step and filter it for the people who love pandas, as shown in
Examples 5-9 through 5-11 .
Example 5-9. Saving JSON in Python
( data . filter ( lambda x : x [ 'lovesPandas' ]) . map ( lambda x : json . dumps ( x ))
. saveAsTextFile ( outputFile ))
Example 5-10. Saving JSON in Scala
result . filter ( p => P . lovesPandas ). map ( mapper . writeValueAsString ( _ ))
. saveAsTextFile ( outputFile )
Example 5-11. Saving JSON in Java
class WriteJson implements FlatMapFunction < Iterator < Person >, String > {
public Iterable < String > call ( Iterator < Person > people ) throws Exception {
ArrayList < String > text = new ArrayList < String >();
ObjectMapper mapper = new ObjectMapper ();
while ( people . hasNext ()) {
Person person = people . next ();
text . add ( mapper . writeValueAsString ( person ));
}
return text ;
}
}
JavaRDD < Person > result = input . mapPartitions ( new ParseJson ()). filter (
new LikesPandas ());
JavaRDD < String > formatted = result . mapPartitions ( new WriteJson ());
formatted . saveAsTextFile ( outfile );
Search WWH ::




Custom Search