Database Reference
In-Depth Information
very large sizes can also be problematic, as each batch might be too large to build up
in memory. If the rows in your tables are large (i.e., contain hundreds of fields or
contain string fields that can be very long, such as web pages), you may need to lower
the batch size to avoid out-of-memory errors. If not, the default batch size is likely
fine, as there are diminishing returns for extra compression when you go beyond
1,000 records.
Conclusion
With Spark SQL, we have seen how to use Spark with structured and semistructured
data. In addition to the queries explored here, it's important to remember that our
previous tools from Chapter 3 through Chapter 6 can be used on the SchemaRDDs
Spark SQL provides. In many pipelines, it is convenient to combine SQL (for its con‐
ciseness) with code written in other programming languages (for their ability to
express more complex logic). When you use Spark SQL to do this, you also gain some
optimizations from the engine's ability to leverage schemas.
Search WWH ::




Custom Search