Spark SQL - Learning Spark

Database Reference

In-Depth Information

Example 9-20. Parquet file save in Python

pandaFriends . saveAsParquetFile ( "hdfs://..." )

JSON

If you have a JSON file with records fitting the same schema, Spark SQL can infer the

schema by scanning the file and let you access fields by name ( Example 9-21 ). If you

have ever found yourself staring at a huge directory of JSON records, Spark SQL's

schema inference is a very effective way to start working with the data without writ‐

ing any special loading code.

To load our JSON data, all we need to do is call the jsonFile() function on our

hiveCtx , as shown in Examples 9-22 through 9-24 . If you are curious about what the

inferred schema for your data is, you can call printSchema on the resulting Sche‐

maRDD ( Example 9-25 ).

Example 9-21. Input records

{ "name" : "Holden" }

{ "name" : "Sparky The Bear" , "lovesPandas" : true , "knows" :{ "friends" : [ "holden" ]}}

Example 9-22. Loading JSON with Spark SQL in Python

input = hiveCtx . jsonFile ( inputFile )

Example 9-23. Loading JSON with Spark SQL in Scala

val input = hiveCtx . jsonFile ( inputFile )

Example 9-24. Loading JSON with Spark SQL in Java

SchemaRDD input = hiveCtx . jsonFile ( jsonFile );

Example 9-25. Resulting schema from printSchema()

root

|-- knows: struct (nullable = true)

| |-- friends: array (nullable = true)

| | |-- element: string (containsNull = false)

|-- lovesPandas: boolean (nullable = true)

|-- name: string (nullable = true)

You can also look at the schema generated for some tweets, as in Example 9-26 .

Search WWH ::

Custom Search

Home