Spark SQL - Learning Spark

Database Reference

In-Depth Information

Example 9-4. Java SQL imports

// Import Spark SQL

import org.apache.spark.sql.hive.HiveContext ;

// Or if you can't have the hive dependencies

import org.apache.spark.sql.SQLContext ;

// Import the JavaSchemaRDD

import org.apache.spark.sql.SchemaRDD ;

import org.apache.spark.sql.Row ;

Example 9-5. Python SQL imports

# Import Spark SQL

from pyspark.sql import HiveContext , Row

# Or if you can't include the hive requirements

from pyspark.sql import SQLContext , Row

Once we've added our imports, we need to create a HiveContext, or a SQLContext if

we cannot bring in the Hive dependencies (see Examples 9-6 through 9-8 ). Both of

these classes take a SparkContext to run on.

Example 9-6. Constructing a SQL context in Scala

val sc = new SparkContext (...)

val hiveCtx = new HiveContext ( sc )

Example 9-7. Constructing a SQL context in Java

JavaSparkContext ctx = new JavaSparkContext (...);

SQLContext sqlCtx = new HiveContext ( ctx );

Example 9-8. Constructing a SQL context in Python

hiveCtx = HiveContext ( sc )

Now that we have a HiveContext or SQLContext, we are ready to load our data and

query it.

Basic Query Example

To make a query against a table, we call the sql() method on the HiveContext or

SQLContext. The first thing we need to do is tell Spark SQL about some data to

query. In this case we will load some Twitter data from JSON, and give it a name by

registering it as a “temporary table” so we can query it with SQL. (We will go over

more details on loading in “Loading and Saving Data” on page 170 .) Then we can select

the top tweets by retweetCount . See Examples 9-9 through 9-11 .

Search WWH ::

Custom Search

Home