Database Reference
In-Depth Information
REFRESH : In a multimode environment, the data files reside on multiple
DataNodes while the Impala shell is interacting with the Impala daemon
(which acts as the data coordinator for all other nodes). Data files can be up-
dated on other nodes without any update event or information to the coordin-
ator. In this situation, using the REFRESH clause with the table name loads
the latest metadata and block location of the data files for a particular table.
Please refer to Chapter 2 , The Impala Shell Commands and Interface , to un-
derstand more on how REFRESH works and why it is so important to use.
JOIN : The JOIN clause is used in SQL statements to select data from two or
more tables and then return the result set containing items from some of all
of those tables, depending on the conditions applied. The JOIN query res-
ult set is filtered by including the corresponding join column names in the ON
clause or by comparison operators referencing columns from both tables in
the WHERE clause. To improve JOIN performance, here are some sugges-
tions:
• It is advisable to perform the JOIN operation on the biggest table first
and then smaller tables
• Join subsequent tables depending on which table has the most se-
lective filter
Note
The JOIN clause itself is very detailed, so I have introduced it here only
for reference; however, I would suggest you study some SQL document-
ation on JOIN to learn more about it.
Search WWH ::




Custom Search