Press "Enter" to skip to content

Day: June 3, 2025

What’s New in Apache Spark 4.0

Ram Ghadiyaram looks at recent updates to Apache Spark:

Hurray! Apache Spark 4.0, released in 2025, redefines big data processing with innovations that enhance performance, accessibility, and developer productivity. With contributions from over 400 developers across organizations like Databricks, Apple, and NVIDIA, Spark 4.0 resolves thousands of JIRA issues, introducing transformative features: native plotting in PySpark, Python Data Source API, polymorphic User-Defined Table Functions (UDTFs), state store enhancements, SQL scripting, and Spark Connect improvements. This report provides an in-depth exploration of these features, their technical underpinnings, and practical applications through original examples and diagrams.

Click through to see what’s on the list of major features.

Leave a Comment

Apache Spark 3.5 Support in Azure Synapse Analytics

Arshad Ali has an announcement:

You can now create Azure Synapse Runtime for Apache Spark 3.5. The essential changes include features which come from upgrading Apache Spark to version 3.5 and Delta Lake 3.2. Please review the official release notes for Apache Spark 3.5 to check the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.4 and 3.5 to assess potential changes to your applications, jobs and notebooks. 

Credit where credit is due: I’ve made light of the utter lack of work on Azure Synapse Analytics since Microsoft Fabric’s release. But hey, they did a thing. Granted, the impetus behind this was to “prepare for migrating to Microsoft Fabric Spark.”

Leave a Comment

SQL Server Performance Office Hours

Erik Darling is back with a new episode of office hours:

Do you know of any disadvantages of using a filtered index to filter NULL values? We have a very heavy transactional table, like 10k trans/sec, with a clustered index and one non-clustered index. We don’t have any queries that select rows with NULL values ​​from this table. The DBA team said we should avoid using a filtered index without any proof. What do you think?

Click through for Erik’s answers in video form. I was workshopping a joke around how all of the evidence Erik has of me being mean to him are lies, but couldn’t make it work without riding the line of “Wait…is he serious?”

Leave a Comment

Controlling Selections in Calculation Groups

Marco Russo and Alberto Ferrari looks at calculation groups:

Calculation groups are often used to display options in a report to change the calculation of existing measures by selecting items on a slicer. However, only a single calculation item can be executed for a measure reference, which could make the semantic model harder to use when the user selects two or more items in a calculation group.

Two new calculation group properties, multipleOrEmptySelectionExpression and noSelectionExpression, provide a way to control the calculation in these conditions that, so far, ignored the presence of the calculation group, thus executing the measures without applying any transformation. This article shows how to use these features and provides guidance on using the feature in preview: despite not having a user interface to manage these new properties, the TMDL view in Power BI Desktop and external tools like Tabular Editor already allow you to create and publish a semantic model that uses these new properties.

Read on to see how these properties work.

Leave a Comment

Date and Time Data Types in MySQL and PostgreSQL

Aisha Bukar compares and contrasts:

MySQL and PostgreSQL offer several data types that can be used for handling dates and times. These data types provide the tools to store and manage information like dates of a particular event, timestamps, and even time durations. While they both share some similarities on how they handle date and time, there are key differences in how they handle precision, time zones, and date/time calculations.

Getting date and time data right is key for keeping databases accurate and useful. In this article, we will compare how MySQL and PostgreSQL handle date and time data, their differences, strengths, and which one might work better for your needs. By the end, you’ll have a clearer idea of which database to choose for managing date and time information.

Click through to learn about the two platforms.

Leave a Comment

Session Variables in PostgreSQL

Kaarel Moppel talks session variables:

Animated by some comments / complaints about Postgres’ missing user variables story on a Reddit post about PostgreSQL pain points in the real world – I thought I’d elaborate a bit on sessions vars – which is indeed a little known Postgres functionality.

Although this “alley” has existed for ages – and one can also use injected session variables to implement crazy stuff like token based Row Level Security or store huge and complex JSON state, or just keep a bit of DB-side state over essentially stateless statement-level connection pools – should you actually use it? What are the alternatives instead? Read on …

Click through to learn more.

Leave a Comment

Working with Memory-Optimized tempdb

Haripriya Naidu deals with metadata contention:

This feature is specifically designed to reduce metadata contention. Note that adding data files will not resolve metadata contention, as that addresses a different type of contention.

You can learn more about enabling this feature and its benefits here.

A company I used to work for was a perfect candidate for this, except that the limitations meant that we couldn’t actually use it. We ended up switching some of our most frequently recurring temp tables and table-valued parameters to memory-optimized user-defined table types and got us out of our metadata contention mess without using this feature.

Leave a Comment

Sharing Power BI Reports across Tenants

Soheil Bakhshi does a bit of sharing:

In this post, we’ll focus on a practical scenario. One organisation, let’s call it Tenant A, wants to share a Power BI report with someone from another organisation, Tenant B. We’ll cover everything from verifying licenses to configuring the Fabric Admin Portal and inviting the external user. If you’re looking to follow along, this guide will give you a clear path to replicate the same setup in your environment.

Click through for the process.

Leave a Comment