Loading and Saving Your Data - Learning Spark

Database Reference

In-Depth Information

• Next, we provide a query that can read a range of the data, as well as a lower

Bound and upperBound value for the parameter to this query. These parameters

allow Spark to query different ranges of the data on different machines, so we

don't get bottlenecked trying to load all the data on a single node. 7

• The last parameter is a function that converts each row of output from a

java.sql.ResultSet to a format that is useful for manipulating our data. In

Example 5-37 , we will get ( Int , String ) pairs. If this parameter is left out, Spark

will automatically convert each row to an array of objects.

As with other data sources, when using JdbcRDD , make sure that your database can

handle the load of parallel reads from Spark. If you'd like to query the data offline

rather than the live database, you can always use your database's export feature to

export a text file.

Cassandra

Spark's Cassandra support has improved greatly with the introduction of the open

source Spark Cassandra connector from DataStax. Since the connector is not cur‐

rently part of Spark, you will need to add some further dependencies to your build

file. Cassandra doesn't yet use Spark SQL, but it returns RDDs of CassandraRow

objects, which have some of the same methods as Spark SQL's Row object, as shown in

Examples 5-38 and 5-39 .The Spark Cassandra connector is currently only available in

Java and Scala.

Example 5-38. sbt requirements for Cassandra connector

"com.datastax.spark" %% "spark-cassandra-connector" % "1.0.0-rc5" ,

"com.datastax.spark" %% "spark-cassandra-connector-java" % "1.0.0-rc5"

Example 5-39. Maven requirements for Cassandra connector

<groupId> com.datastax.spark </groupId>

<artifactId> spark-cassandra-connector </artifactId>

</dependency>

<groupId> com.datastax.spark </groupId>

<artifactId> spark-cassandra-connector-java </artifactId>

</dependency>

7 If you don't know how many records there are, you can just do a count query manually first and use its result

to determine the upperBound and lowerBound .

Search WWH ::

Custom Search

Home