Database Reference
In-Depth Information
Using a Hive UDF requires that we use the HiveContext instead of a regular
SQLContext. To make a Hive UDF available, simply call hiveCtx.sql("CREATE TEM
PORARY FUNCTION name AS class.function ") .
Spark SQL Performance
As alluded to in the introduction, Spark SQL's higher-level query language and addi‐
tional type information allows Spark SQL to be more efficient.
Spark SQL is for more than just users who are familiar with SQL. Spark SQL makes it
very easy to perform conditional aggregate operations, like counting the sum of mul‐
tiple columns (as shown in Example 9-40 ), without having to construct special
objects as we discussed in Chapter 6 .
Example 9-40. Spark SQL multiple sums
SELECT SUM ( user . favouritesCount ), SUM ( retweetCount ), user . id FROM tweets
GROUP BY user . id
Spark SQL is able to use the knowledge of types to more efficiently represent our
data. When caching data, Spark SQL uses an in-memory columnar storage. This not
only takes up less space when cached, but if our subsequent queries depend only on
subsets of the data, Spark SQL minimizes the data read.
Predicate push-down allows Spark SQL to move some parts of our query “down” to
the engine we are querying. If we wanted to read only certain records in Spark, the
standard way to handle this would be to read in the entire dataset and then execute a
filter on it. However, in Spark SQL, if the underlying data store supports retrieving
only subsets of the key range, or another restriction, Spark SQL is able to push the
restrictions in our query down to the data store, resulting in potentially much less
data being read.
Performance Tuning Options
There are a number of different performance tuning options with Spark SQL; they're
listed in Table 9-2 .
Table 9-2. Performance options in Spark SQL
Option
Default
Usage
When true , Spark SQL will compile each
query to Java bytecode on the fly. This can
improve performance for large queries, but
codegen can slow down very short queries.
spark.sql.codegen
false
Search WWH ::




Custom Search