Database Reference
In-Depth Information
Option
Default
Usage
Compress the in-memory columnar storage
automatically.
spark.sql.inMemoryColumnarStorage.com
pressed
false
The batch size for columnar caching. Larger
values may cause out-of-memory problems
spark.sql.inMemoryColumnarStorage.batch
Size
1000
Which compression codec to use. Possible
options include uncompressed , snappy ,
gzip , and lzo .
spark.sql.parquet.compression.codec
snappy
Using the JDBC connector, and the Beeline shell, we can set these performance
options, and other options, with the set command, as shown in Example 9-41 .
Example 9-41. Beeline command for enabling codegen
beeline > set spark . sql . codegen = true ;
SET spark . sql . codegen = true
spark . sql . codegen = true
Time taken : 1 . 196 seconds
In a traditional Spark SQL application we can set these Spark properties on our Spark
configuration instead, as shown in Example 9-42 .
Example 9-42. Scala code for enabling codegen
conf . set ( "spark.sql.codegen" , "true" )
A few options warrant special attention. First is spark.sql.codegen , which causes
Spark SQL to compile each query to Java bytecode before running it. Codegen can
make long queries or frequently repeated queries substantially faster, because it gen‐
erates specialized code to run them. However, in a setting with very short (1-2 sec‐
onds) ad hoc queries, it may add overhead as it has to run a compiler for each query. 1
Codegen is also still experimental, but we recommend trying it for any workload with
large queries, or with the same query repeated over and over.
The second option you may need to tune is spark.sql.inMemoryColumnarStor
age.batchSize . When caching SchemaRDDs, Spark SQL groups together the records
in the RDD in batches of the size given by this option (default: 1000 ), and compresses
each batch. Very small batch sizes lead to low compression, but on the other hand,
1 Note that the first few runs of codegen will be especially slow as it needs to initialize its compiler, so you
should run four to five queries before measuring its overhead.
 
Search WWH ::




Custom Search