Database Reference
In-Depth Information
•
REFRESH
: In a multimode environment, the data files reside on multiple
DataNodes while the Impala shell is interacting with the Impala daemon
(which acts as the data coordinator for all other nodes). Data files can be up-
dated on other nodes without any update event or information to the coordin-
ator. In this situation, using the
REFRESH
clause with the table name loads
the latest metadata and block location of the data files for a particular table.
derstand more on how
REFRESH
works and why it is so important to use.
•
JOIN
: The
JOIN
clause is used in SQL statements to select data from two or
more tables and then return the result set containing items from some of all
of those tables, depending on the conditions applied. The
JOIN
query res-
ult set is filtered by including the corresponding join column names in the
ON
clause or by comparison operators referencing columns from both tables in
the
WHERE
clause. To improve
JOIN
performance, here are some sugges-
tions:
• It is advisable to perform the
JOIN
operation on the biggest table first
and then smaller tables
• Join subsequent tables depending on which table has the most se-
lective filter
Note
The
JOIN
clause itself is very detailed, so I have introduced it here only
for reference; however, I would suggest you study some SQL document-
ation on
JOIN
to learn more about it.