Database Reference
In-Depth Information
connectors to Hive can also connect to Spark SQL using their existing Hive connec‐
tor, because it uses the same query language and server.
Working with Beeline
Within the Beeline client, you can use standard HiveQL commands to create, list, and
query tables. You can find the full details of HiveQL in the
Hive Language Manual
,
but here, we show a few common operations.
First, to create a table from local data, we can use the
CREATE TABLE
command, fol‐
lowed by
LOAD DATA
. Hive easily supports loading text files with a fixed delimiter
such as CSVs, as well as other files, as shown in
Example 9-33
.
Example 9-33. Load table
>
CREATE
TABLE
IF
NOT
EXISTS
mytable
(
key
INT
,
value
STRING
)
ROW
FORMAT
DELIMITED
FIELDS
TERMINATED
BY
','
;
>
LOAD
DATA
LOCAL
INPATH
'learning-spark-examples/files/int_string.csv'
INTO
TABLE
mytable
;
To list tables, you can use the
SHOW TABLES
statement (
Example 9-34
). You can also
describe each table's schema with
DESCRIBE
tableName
.
Example 9-34. Show tables
>
SHOW
TABLES
;
mytable
Time
taken
:
0
.
052
seconds
If you'd like to cache tables, use
CACHE TABLE
tableName
. You can later uncache
tables with
UNCACHE TABLE
tableName
. Note that the cached tables are shared across
all clients of this JDBC server, as explained earlier.
Finally, Beeline makes it easy to view query plans. You can run
EXPLAIN
on a given
query to see what the execution plan will be, as shown in
Example 9-35
.
Example 9-35. Spark SQL shell EXPLAIN
spark
-
sql
>
EXPLAIN
SELECT
*
FROM
mytable
where
key
=
1
;
==
Physical
Plan
==
Filter
(
key
#
16
=
1
)
HiveTableScan
[
key
#
16
,
value
#
17
],
(
MetastoreRelation
default
,
mytable
,
None
),
None
Time
taken
:
0
.
551
seconds
In this specific query plan, Spark SQL is applying a filter on top of a
HiveTableScan
.