Database Reference
In-Depth Information
Fine-tuning Impala performance
In this section, let's review a few key factors that affect Impala performance.
Partitioning
In this method, data is physically divided from frequently queried fields or columns
into different values. This way, when a query is executed, it processes only a specific
partition or a portion of the data, achieving significant faster results than the full data-
set. In general, data files specific to a single table reside in a single directory. Using
partitioning, you can distribute the data in a way such that a fraction of data should be
read, depending on the query and its data-limiting clause. Once a partition is applied
to a table, the data is physically loaded on a different location on the disk based on
the query parameter, which provides faster access to the data when queried by the
SELECT statement and the partition name.
You can write TABLE specific Impala SQL statements as follows to take advantage of
the PARTITIONED BY method:
• Add a partition when creating a table by adding the PARTITIONED BY clause
as follows:
CREATE TABLE […] PARTITIONED BY
• You can modify an existing table to support partitions by using the ALTER
TABLE statement followed by PARTITONED BY as follows:
ALTER TABLE […] PARTITIONED BY
Join queries
It is a well-known fact that a query operating on multiple tables using the JOIN opera-
tion will take a long time to finish if it is not written correctly. Here are some techniques
you can use in this regard:
• A well-known technique for fine-tuning an Impala join query is to specify the
tables in an optimal order by first having the table that has the maximum num-
Search WWH ::




Custom Search