Press "Enter" to skip to content

Session, DataFrameWriter, and Table Configurations in Spark

Miles Cole makes a configuration change:

With Spark and Delta Lake, just like with Hudi and Iceberg, there are several ways to enable or disable settings that impact how tables are created. These settings may affect data layout or table format features, but it can be confusing to understand why different methods exist, when each should be used, and how property inheritance works.

While platform defaults should account for most use cases, Spark provides flexibility to optimize various workloads, whether adjusting for read or write performance, or for hot or cold path data processing. Inevitably, the need to adjust configurations from the default will arise. So, how do we do this effectively?

Read on to learn how.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.