Spark SQL - Learning Spark

Database Reference

In-Depth Information

connectors to Hive can also connect to Spark SQL using their existing Hive connec‐

tor, because it uses the same query language and server.

Working with Beeline

Within the Beeline client, you can use standard HiveQL commands to create, list, and

query tables. You can find the full details of HiveQL in the Hive Language Manual ,

but here, we show a few common operations.

First, to create a table from local data, we can use the CREATE TABLE command, fol‐

lowed by LOAD DATA . Hive easily supports loading text files with a fixed delimiter

such as CSVs, as well as other files, as shown in Example 9-33 .

Example 9-33. Load table

> CREATE TABLE IF NOT EXISTS mytable ( key INT , value STRING )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

> LOAD DATA LOCAL INPATH 'learning-spark-examples/files/int_string.csv'

INTO TABLE mytable ;

To list tables, you can use the SHOW TABLES statement ( Example 9-34 ). You can also

describe each table's schema with DESCRIBE tableName .

Example 9-34. Show tables

> SHOW TABLES ;

mytable

Time taken : 0 . 052 seconds

If you'd like to cache tables, use CACHE TABLE tableName . You can later uncache

tables with UNCACHE TABLE tableName . Note that the cached tables are shared across

all clients of this JDBC server, as explained earlier.

Finally, Beeline makes it easy to view query plans. You can run EXPLAIN on a given

query to see what the execution plan will be, as shown in Example 9-35 .

Example 9-35. Spark SQL shell EXPLAIN

spark - sql > EXPLAIN SELECT * FROM mytable where key = 1 ;

== Physical Plan ==

Filter ( key # 16 = 1 )

HiveTableScan [ key # 16 , value # 17 ], ( MetastoreRelation default , mytable , None ), None

Time taken : 0 . 551 seconds

In this specific query plan, Spark SQL is applying a filter on top of a HiveTableScan .

Search WWH ::

Custom Search

Home