Database Reference
In-Depth Information
public void setName ( String n ) { name = n ; }
public String getFavouriteBeverage () { return favouriteBeverage ; }
public void setFavouriteBeverage ( String b ) { favouriteBeverage = b ; }
};
...
ArrayList < HappyPerson > peopleList = new ArrayList < HappyPerson >();
peopleList . add ( new HappyPerson ( "holden" , "coffee" ));
JavaRDD < HappyPerson > happyPeopleRDD = sc . parallelize ( peopleList );
SchemaRDD happyPeopleSchemaRDD = hiveCtx . applySchema ( happyPeopleRDD ,
HappyPerson . class );
happyPeopleSchemaRDD . registerTempTable ( "happy_people" );
JDBC/ODBC Server
Spark SQL also provides JDBC connectivity, which is useful for connecting business
intelligence (BI) tools to a Spark cluster and for sharing a cluster across multiple
users. The JDBC server runs as a standalone Spark driver program that can be shared
by multiple clients. Any client can cache tables in memory, query them, and so on,
and the cluster resources and cached data will be shared among all of them.
Spark SQL's JDBC server corresponds to the HiveServer2 in Hive. It is also known as
the “Thrift server” since it uses the Thrift communication protocol. Note that the
JDBC server requires Spark be built with Hive support.
The server can be launched with sbin/start-thriftserver.sh in your Spark direc‐
tory ( Example 9-31 ). This script takes many of the same options as spark-submit . By
default it listens on localhost:10000 , but we can change these with either environ‐
ment variables ( HIVE_SERVER2_THRIFT_PORT and HIVE_SERVER2_THRIFT_BIND_HOST ),
or with Hive configuration properties ( hive.server2.thrift.port and
hive.server2.thrift.bind.host ). You can also specify Hive properties on the com‐
mand line with --hiveconf property=value .
Example 9-31. Launching the JDBC server
./sbin/start-thriftserver.sh --master sparkMaster
Spark also ships with the Beeline client program we can use to connect to our JDBC
server, as shown in Example 9-32 and Figure 9-3 . This is a simple SQL shell that lets
us run commands on the server.
Example 9-32. Connecting to the JDBC server with Beeline
holden@hmbp2:~/repos/spark $ ./bin/beeline -u jdbc:hive2://localhost:10000
Spark assembly has been built with Hive, including Datanucleus jars on classpath
scan complete in 1ms
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL ( version 1.2.0-SNAPSHOT )
Search WWH ::




Custom Search