Database Reference
In-Depth Information
Example 9-4. Java SQL imports
// Import Spark SQL
import org.apache.spark.sql.hive.HiveContext ;
// Or if you can't have the hive dependencies
import org.apache.spark.sql.SQLContext ;
// Import the JavaSchemaRDD
import org.apache.spark.sql.SchemaRDD ;
import org.apache.spark.sql.Row ;
Example 9-5. Python SQL imports
# Import Spark SQL
from pyspark.sql import HiveContext , Row
# Or if you can't include the hive requirements
from pyspark.sql import SQLContext , Row
Once we've added our imports, we need to create a HiveContext, or a SQLContext if
we cannot bring in the Hive dependencies (see Examples 9-6 through 9-8 ). Both of
these classes take a SparkContext to run on.
Example 9-6. Constructing a SQL context in Scala
val sc = new SparkContext (...)
val hiveCtx = new HiveContext ( sc )
Example 9-7. Constructing a SQL context in Java
JavaSparkContext ctx = new JavaSparkContext (...);
SQLContext sqlCtx = new HiveContext ( ctx );
Example 9-8. Constructing a SQL context in Python
hiveCtx = HiveContext ( sc )
Now that we have a HiveContext or SQLContext, we are ready to load our data and
query it.
Basic Query Example
To make a query against a table, we call the sql() method on the HiveContext or
SQLContext. The first thing we need to do is tell Spark SQL about some data to
query. In this case we will load some Twitter data from JSON, and give it a name by
registering it as a “temporary table” so we can query it with SQL. (We will go over
more details on loading in “Loading and Saving Data” on page 170 .) Then we can select
the top tweets by retweetCount . See Examples 9-9 through 9-11 .
Search WWH ::




Custom Search