Database Reference
In-Depth Information
connectors to Hive can also connect to Spark SQL using their existing Hive connec‐
tor, because it uses the same query language and server.
Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and
query tables. You can find the full details of HiveQL in the Hive Language Manual ,
but here, we show a few common operations.
First, to create a table from local data, we can use the CREATE TABLE command, fol‐
lowed by LOAD DATA . Hive easily supports loading text files with a fixed delimiter
such as CSVs, as well as other files, as shown in Example 9-33 .
Example 9-33. Load table
> CREATE TABLE IF NOT EXISTS mytable ( key INT , value STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;
> LOAD DATA LOCAL INPATH 'learning-spark-examples/files/int_string.csv'
INTO TABLE mytable ;
To list tables, you can use the SHOW TABLES statement ( Example 9-34 ). You can also
describe each table's schema with DESCRIBE tableName .
Example 9-34. Show tables
> SHOW TABLES ;
mytable
Time taken : 0 . 052 seconds
If you'd like to cache tables, use CACHE TABLE tableName . You can later uncache
tables with UNCACHE TABLE tableName . Note that the cached tables are shared across
all clients of this JDBC server, as explained earlier.
Finally, Beeline makes it easy to view query plans. You can run EXPLAIN on a given
query to see what the execution plan will be, as shown in Example 9-35 .
Example 9-35. Spark SQL shell EXPLAIN
spark - sql > EXPLAIN SELECT * FROM mytable where key = 1 ;
== Physical Plan ==
Filter ( key # 16 = 1 )
HiveTableScan [ key # 16 , value # 17 ], ( MetastoreRelation default , mytable , None ), None
Time taken : 0 . 551 seconds
In this specific query plan, Spark SQL is applying a filter on top of a HiveTableScan .
Search WWH ::




Custom Search