Database Reference
In-Depth Information
Example 5-6. Loading unstructured JSON in Python
import
json
data
=
input
.
map
(
lambda
x
:
json
.
loads
(
x
))
In Scala and Java, it is common to load records into a class representing their sche‐
mas. At this stage, we may also want to skip invalid records. We show an example of
loading records as instances of a Person class.
Example 5-7. Loading JSON in Scala
import
com.fasterxml.jackson.module.scala.DefaultScalaModule
import
com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import
com.fasterxml.jackson.databind.ObjectMapper
import
com.fasterxml.jackson.databind.DeserializationFeature
...
case
class
Person
(
name
:
String
,
lovesPandas
:
Boolean
)
// Must be a top-level class
...
// Parse it into a specific case class. We use flatMap to handle errors
// by returning an empty list (None) if we encounter an issue and a
// list with one element if everything is ok (Some(_)).
val
result
=
input
.
flatMap
(
record
=>
{
try
{
Some
(
mapper
.
readValue
(
record
,
classOf
[
Person
]))
}
catch
{
case
e
:
Exception
=>
None
}})
Example 5-8. Loading JSON in Java
class
ParseJson
implements
FlatMapFunction
<
Iterator
<
String
>,
Person
>
{
public
Iterable
<
Person
>
call
(
Iterator
<
String
>
lines
)
throws
Exception
{
ArrayList
<
Person
>
people
=
new
ArrayList
<
Person
>();
ObjectMapper
mapper
=
new
ObjectMapper
();
while
(
lines
.
hasNext
())
{
String
line
=
lines
.
next
();
try
{
people
.
add
(
mapper
.
readValue
(
line
,
Person
.
class
));
}
catch
(
Exception
e
)
{
// skip records on failure
}
}
return
people
;
}
}
JavaRDD
<
String
>
input
=
sc
.
textFile
(
"file.json"
);
JavaRDD
<
Person
>
result
=
input
.
mapPartitions
(
new
ParseJson
());