Database Reference
In-Depth Information
Choosing an appropriate file format and
compression
type
for
better
performance
Impala is used to process large amounts of data stored in your Hadoop cluster. There
is no limitation in Hadoop about what type of data can be stored; however, to improve
data access performance in Hadoop, some file types and compression provide better
results than others. Impala can query most of the popular structured and unstructured
file formats available in Hadoop along with compression used in a file. Here is a list of
the supported file formats and compression types in Impala:
File type
File format
Compression type
Text
Unstructured
LZO
Avro
Structured
GZIP, BZIP2, deflate, Snappy
RCFile
Structured
GZIP, BZIP2, deflate, Snappy
SequenceFile
Structured
GZIP, BZIP2, deflate, Snappy
Parquet
Structured
GZIP, Snappy (Default)
Now let's take a look at how choosing a proper file format can improve performance
in Impala:
Search WWH ::




Custom Search