Database Reference
In-Depth Information
the place where data is stored, which is one of the premier reasons why Impala is
very fast in comparison to other large data processing systems.
Before we learn more about Impala, let's see what the key Impala features are:
• First and foremost, Impala is 100% open source under the Apache license
• Impala is a native MPP engine, running on the Cloudera Hadoop distribution
• Impala supports in-memory processing for data through SQL-like queries
• Impala uses Hadoop Distributed File System ( HDFS ) and HBase
• Impala supports integration with leading Business Intelligence tools, such as
Tableau, Pentaho, Microstrategy, Zoomdata, and so on
• Impala supports a wide variety of input file formats, that is, regular text files,
files in CSV/TSV or other delimited format, sequence files, Avro, RCFile,
LZO, and Parquet types
• For third-party application connectivity, Impala supports ODBC drive, SQL-
like syntax, and Beeswax GUI (in Apache Hue) from Apache Hive
• Impala uses Kerberos authentication and role-based authorization with
Sentry
The key benefits of using Impala are:
• Impala uses Hive to read a table's metadata; however, using its own distrib-
uted execution engine it makes data processing very fast. So the very first
benefit of using Impala is the super fast access of data from HDFS.
• Impala uses a SQL-like syntax to interact with data, so you can leverage the
existing BI tools to interact with data stored on Hadoop. The engineers with
SQL expertise can benefit from Impala as they do not need to learn new lan-
guages and skills. Additionally, Impala offers higher performance and execu-
tion speed.
• While running on Hadoop, Impala leverages the Hadoop file and data format,
metadata, resource management, and security, all available on Hadoop.
• As Impala interacts with the stored data in Hadoop, it preserves full fidelity of
data while analyzing the data, due to aggregations or conformance of fixed
schemas.
• Impala performs interactive analysis directly on the data stored on Hadoop
DataNodes without requiring data movement, which results in lightening-fast
query results, because there are no network bottlenecks and the time avail-
able to move data is zero.
Search WWH ::




Custom Search