Database Reference
In-Depth Information
• Next, we provide a query that can read a range of the data, as well as a lower
Bound and upperBound value for the parameter to this query. These parameters
allow Spark to query different ranges of the data on different machines, so we
don't get bottlenecked trying to load all the data on a single node. 7
• The last parameter is a function that converts each row of output from a
java.sql.ResultSet to a format that is useful for manipulating our data. In
Example 5-37 , we will get ( Int , String ) pairs. If this parameter is left out, Spark
will automatically convert each row to an array of objects.
As with other data sources, when using JdbcRDD , make sure that your database can
handle the load of parallel reads from Spark. If you'd like to query the data offline
rather than the live database, you can always use your database's export feature to
export a text file.
Cassandra
Spark's Cassandra support has improved greatly with the introduction of the open
source Spark Cassandra connector from DataStax. Since the connector is not cur‐
rently part of Spark, you will need to add some further dependencies to your build
file. Cassandra doesn't yet use Spark SQL, but it returns RDDs of CassandraRow
objects, which have some of the same methods as Spark SQL's Row object, as shown in
Examples 5-38 and 5-39 .The Spark Cassandra connector is currently only available in
Java and Scala.
Example 5-38. sbt requirements for Cassandra connector
"com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc5" ,
"com.datastax.spark" %% "spark-cassandra-connector-java" % "1.0.0-rc5"
Example 5-39. Maven requirements for Cassandra connector
<dependency> <!-- Cassandra -->
<groupId> com.datastax.spark </groupId>
<artifactId> spark-cassandra-connector </artifactId>
<version> 1.0.0-rc5 </version>
</dependency>
<dependency> <!-- Cassandra -->
<groupId> com.datastax.spark </groupId>
<artifactId> spark-cassandra-connector-java </artifactId>
<version> 1.0.0-rc5 </version>
</dependency>
7 If you don't know how many records there are, you can just do a count query manually first and use its result
to determine the upperBound and lowerBound .
 
Search WWH ::




Custom Search