Database Reference
In-Depth Information
Example 5-36. JSON loading with Spark SQL in Java
SchemaRDD tweets = hiveCtx . jsonFile ( jsonFile );
tweets . registerTempTable ( "tweets" );
SchemaRDD results = hiveCtx . sql ( "SELECT user.name, text FROM tweets" );
We discuss more about how to load JSON data with Spark SQL and access its schema
in “JSON” on page 172 . In addition, Spark SQL supports quite a bit more than loading
data, including querying the data, combining it in more complex ways with RDDs,
and running custom functions over it, which we will cover in Chapter 9 .
Databases
Spark can access several popular databases using either their Hadoop connectors or
custom Spark connectors. In this section, we will show four common connectors.
Java Database Connectivity
Spark can load data from any relational database that supports Java Database Con‐
nectivity (JDBC), including MySQL, Postgres, and other systems. To access this data,
we construct an org.apache.spark.rdd.JdbcRDD and provide it with our SparkCon‐
text and the other parameters. Example 5-37 walks you through using JdbcRDD for a
MySQL database.
Example 5-37. JdbcRDD in Scala
def createConnection () = {
Class . forName ( "com.mysql.jdbc.Driver" ). newInstance ();
DriverManager . getConnection ( "jdbc:mysql://localhost/test?user=holden" );
}
def extractValues ( r : ResultSet ) = {
( r . getInt ( 1 ), r . getString ( 2 ))
}
val data = new JdbcRDD ( sc ,
createConnection , "SELECT * FROM panda WHERE ? <= id AND id <= ?" ,
lowerBound = 1 , upperBound = 3 , numPartitions = 2 , mapRow = extractValues )
println ( data . collect (). toList )
JdbcRDD takes several parameters:
• First, we provide a function to establish a connection to our database. This lets
each node create its own connection to load data over, after performing any con‐
figuration required to connect.
Search WWH ::




Custom Search