Spark 2.0 Out

Apache Spark 2.0 has officially been released.  Vinay Shukla gives us some highlights:

Performance
Project Tungsten has completed another major phase and with new completely new stage code generation, significant performance improvements have been delivered. Parquet and ORC file processing have also delivered performance improvements.

Databricks Community Edition offers (tiny) free clusters with Spark 2.0 on top of Scala 2.10 and Scala 2.11.

Related Posts

When Not to Use Spark

Ramandeep Kaur gives us several cases when it makes sense not to use Apache Spark: There can be use cases where Spark would be the inevitable choice. Spark considered being an excellent tool for use cases like ETL of a large amount of a dataset, analyzing a large set of data files, Machine learning, and […]

Read More

What Compatibility Level 150 Gets You

Erik Darling explains the upsides and downsides of moving to SQL Server 2019 and compatibility level 150: In those versions, flipping compatibility level uses the new Cardinality Estimator (CE). That new Cardinality Estimator is real hit or miss. The worst part is that there’s practically no gain to be realized for using higher compatibility levels […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031