Database Reference
In-Depth Information
Example 9-20. Parquet file save in Python
pandaFriends . saveAsParquetFile ( "hdfs://..." )
JSON
If you have a JSON file with records fitting the same schema, Spark SQL can infer the
schema by scanning the file and let you access fields by name ( Example 9-21 ). If you
have ever found yourself staring at a huge directory of JSON records, Spark SQL's
schema inference is a very effective way to start working with the data without writ‐
ing any special loading code.
To load our JSON data, all we need to do is call the jsonFile() function on our
hiveCtx , as shown in Examples 9-22 through 9-24 . If you are curious about what the
inferred schema for your data is, you can call printSchema on the resulting Sche‐
maRDD ( Example 9-25 ).
Example 9-21. Input records
{ "name" : "Holden" }
{ "name" : "Sparky The Bear" , "lovesPandas" : true , "knows" :{ "friends" : [ "holden" ]}}
Example 9-22. Loading JSON with Spark SQL in Python
input = hiveCtx . jsonFile ( inputFile )
Example 9-23. Loading JSON with Spark SQL in Scala
val input = hiveCtx . jsonFile ( inputFile )
Example 9-24. Loading JSON with Spark SQL in Java
SchemaRDD input = hiveCtx . jsonFile ( jsonFile );
Example 9-25. Resulting schema from printSchema()
root
|-- knows: struct (nullable = true)
| |-- friends: array (nullable = true)
| | |-- element: string (containsNull = false)
|-- lovesPandas: boolean (nullable = true)
|-- name: string (nullable = true)
You can also look at the schema generated for some tweets, as in Example 9-26 .
Search WWH ::




Custom Search