Database Reference
In-Depth Information
ber of records or rows, which are part of the result set. After that, select the
next largest table in terms of the number of rows in the results set, and finally
the smallest table.
• Another method to optimize a faster JOIN operation is to use the HINT
clause with JOIN to select a specific Impala query planner. When a SELECT
statement based on JOIN is given to Impala to execute, the Impala query
planner works on it to find the best strategy. The Impala query planner first
checks metadata and the number of records in the result set for each table in
the SELECT statement and then chooses an appropriate JOIN strategy. You
can get this information by using the EXPLAIN clause with your query. If you
think that changing the JOIN strategy will be helpful, you can use the HINT
clause as follows to apply the specific JOIN type. You can learn more about
HINT in any detailed SQL documentation:
SELECT table1.field1, table2.field1 FROM
table1 JOIN
[BROADCAST | SHUFFLE] table2 ON
[condition….]
The SQL JOIN operation itself is very large and requires a great deal of understand-
ing to optimize. While the preceding information is good for reference purposes,
I would suggest reading more details on it in an external reference document to
achieve optimum performance with JOIN queries.
Table and column statistics
In the previous section, we talked about the query planner. Now, we will learn how
the query planner decides which strategy is best for it. The query planner uses indi-
vidual column statistics by getting metadata from the metastore if it is available. All
the columns, which are part of the result set in the JOIN query, are calculated for all
the records, which helps the query planner to make its decision.
The Impala query planner also uses statistics for all the tables and partitions from the
metastore and, based on this information, it makes a decision. These table, column,
and partition statistics can be gathered using the ANALYZE TABLE statement by
passing the table name. I would like to inform you that the Impala query planner
does not create this information; instead it depends on Hive for this and the ANALYZE
Search WWH ::




Custom Search