Database Reference
In-Depth Information
Compression type
Why use it?
LZO
Use only with text files
BZIP2
Not a top choice but Impala can read input files
Deflate
Not a first or second choice; however, can read input files
The following are a few considerations to keep in mind when choosing an appropri-
ate file format for a table with Impala:
• When CREATE TABLE is used with Impala, text files are the default input
format. It is easier to read for humans and helps troubleshooting problems;
however, it does not provide superfast processing with large amounts of data
due to significant disk read activity.
• When performance is your primary consideration, use Snappy, and when
disk space saving is your primary consideration, use GZIP. LZO can also be
used with text files as an option to expedite things a little.
• If your source files are already in one of Impala's supported type, create a
table in Impala using the same file format in most of the cases unless chan-
ging the format in the Impala table gives you significant improvement in pro-
cessing the source data in your file.
• If you want to change the file format sometime in Impala, first use CREATE
TABLE to create a table with your desired file type format and then use the
INSERT statement to copy data into the Impala table, which requires a one-
time file conversion from source to Impala.
• Data compression does not always means that you will achieve faster pro-
cessing time by saving important time in disk I/O. Data compression does
require CPU cycles to uncompress before processing so it does adds up
time somewhere. Sometimes having uncompressed data provides significant
speed in processing, that the cost to keep it uncompressed in disk compens-
ating the logic to store uncompressed in disk.
Search WWH ::




Custom Search