Spark SQL - Learning Spark

Database Reference

In-Depth Information

public void setName ( String n ) { name = n ; }

public String getFavouriteBeverage () { return favouriteBeverage ; }

public void setFavouriteBeverage ( String b ) { favouriteBeverage = b ; }

};

...

ArrayList < HappyPerson > peopleList = new ArrayList < HappyPerson >();

peopleList . add ( new HappyPerson ( "holden" , "coffee" ));

JavaRDD < HappyPerson > happyPeopleRDD = sc . parallelize ( peopleList );

SchemaRDD happyPeopleSchemaRDD = hiveCtx . applySchema ( happyPeopleRDD ,

HappyPerson . class );

happyPeopleSchemaRDD . registerTempTable ( "happy_people" );

JDBC/ODBC Server

Spark SQL also provides JDBC connectivity, which is useful for connecting business

intelligence (BI) tools to a Spark cluster and for sharing a cluster across multiple

users. The JDBC server runs as a standalone Spark driver program that can be shared

by multiple clients. Any client can cache tables in memory, query them, and so on,

and the cluster resources and cached data will be shared among all of them.

Spark SQL's JDBC server corresponds to the HiveServer2 in Hive. It is also known as

the “Thrift server” since it uses the Thrift communication protocol. Note that the

JDBC server requires Spark be built with Hive support.

The server can be launched with sbin/start-thriftserver.sh in your Spark direc‐

tory ( Example 9-31 ). This script takes many of the same options as spark-submit . By

default it listens on localhost:10000 , but we can change these with either environ‐

ment variables ( HIVE_SERVER2_THRIFT_PORT and HIVE_SERVER2_THRIFT_BIND_HOST ),

or with Hive configuration properties ( hive.server2.thrift.port and

hive.server2.thrift.bind.host ). You can also specify Hive properties on the com‐

mand line with --hiveconf property=value .

Example 9-31. Launching the JDBC server

./sbin/start-thriftserver.sh --master sparkMaster

Spark also ships with the Beeline client program we can use to connect to our JDBC

server, as shown in Example 9-32 and Figure 9-3 . This is a simple SQL shell that lets

us run commands on the server.

Example 9-32. Connecting to the JDBC server with Beeline

holden@hmbp2:~/repos/spark $ ./bin/beeline -u jdbc:hive2://localhost:10000

Spark assembly has been built with Hive, including Datanucleus jars on classpath

scan complete in 1ms

Connecting to jdbc:hive2://localhost:10000

Connected to: Spark SQL ( version 1.2.0-SNAPSHOT )

Search WWH ::

Custom Search

Home