Database Reference
In-Depth Information
Example 5-6. Loading unstructured JSON in Python
import json
data = input . map ( lambda x : json . loads ( x ))
In Scala and Java, it is common to load records into a class representing their sche‐
mas. At this stage, we may also want to skip invalid records. We show an example of
loading records as instances of a Person class.
Example 5-7. Loading JSON in Scala
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.databind.DeserializationFeature
...
case class Person ( name : String , lovesPandas : Boolean ) // Must be a top-level class
...
// Parse it into a specific case class. We use flatMap to handle errors
// by returning an empty list (None) if we encounter an issue and a
// list with one element if everything is ok (Some(_)).
val result = input . flatMap ( record => {
try {
Some ( mapper . readValue ( record , classOf [ Person ]))
} catch {
case e : Exception => None
}})
Example 5-8. Loading JSON in Java
class ParseJson implements FlatMapFunction < Iterator < String >, Person > {
public Iterable < Person > call ( Iterator < String > lines ) throws Exception {
ArrayList < Person > people = new ArrayList < Person >();
ObjectMapper mapper = new ObjectMapper ();
while ( lines . hasNext ()) {
String line = lines . next ();
try {
people . add ( mapper . readValue ( line , Person . class ));
} catch ( Exception e ) {
// skip records on failure
}
}
return people ;
}
}
JavaRDD < String > input = sc . textFile ( "file.json" );
JavaRDD < Person > result = input . mapPartitions ( new ParseJson ());
Search WWH ::




Custom Search