Press "Enter" to skip to content

Month: March 2025

Performance Profiles of SUM() and SUMX() in DAX

Eugene Meidinger blows the doors off of this conspiracy:

For years, I told people to avoid iterators. I compared them to cursors in SQL, which are really bad, or for loops in C# which are normally fine. I knew that DAX was column based and that it often broke down when doing row-based operations, but I couldn’t tell you why.

Advice to avoid iterators is often based on a misunderstanding and a misapprehension of how the Vertipaq engine works. If you are blindly giving this advice out, like I was, you are promoting a fundamental misunderstanding of how DAX works.

Eugene has since opened his eyes and can see through the lies.

Leave a Comment

The Design of DataDiluvium

Adron Hall wraps up a series on DataDiluvium. First up is some commentary on data generation:

In Parts 1 and 2, I set up the development environment and implemented the schema parsing functionality. Now, I’ll explore the data generation system and final implementation details that make DataDiluvium a complete solution.

And finally we have some additional notes and wrapup:

In my previous three posts, I covered the core functionality of DataDiluvium. In this follow-up post, I’ll explore the additional features, utilities, and implementation details that I’ve added to enhance the application’s functionality and developer experience.

Check out both posts. I do enjoy seeing people walk through and explain some of the key concepts and decisions they’ve made when developing solutions.

Leave a Comment

Speeding up Dataflow Validation and Publish Times

Chris Webb doesn’t want to wait:

If you’re working with slow data sources in Power BI/Fabric dataflows then you’re probably aware that validation (for Gen1 dataflows) or publishing (for Gen2 dataflows) them can sometimes take a long time. If you’re working with very slow data sources then you may run into the 10 minute timeout on validation/publishing that is documented here. For a Gen1 dataflow you’ll see the following error message if you try to save your dataflow and validation takes more than 10 minutes:

Click through for that common error message, as well as some tips to avoid this issue. There was also an interesting approach in the comments section that circumvented the problem as well.

Leave a Comment

Virtualizing Hadoop Data into OneLake via Apache Ozone

James Morantus hits us with a blast from the past:

Microsoft Fabric OneLake shortcuts facilitate the virtualization of data from various cloud object stores and on-premises environments. For on-premises sources like Cloudera/Apache Ozone, the OneLake S3 Compatible Shortcut can be utilized to connect to these data sources. With OneLake Shortcuts, users can create a virtual reference to their Cloudera cluster data without moving or duplicating the data. To learn more about Fabric OneLake shortcuts, reference this blog OneLake with shortcuts.

If this was a decade ago, I’d be a lot more excited. But it is kind of wild how quickly the data landscape changed, with the adoption of Spark over classic Hadoop; cloud-based data lakes over HDFS; and more focused dataset sizes over “give me everything.”

Leave a Comment

Integer Conversion and Rounding in SQL Server

Steve Jones points out a bit of rounding math:

Imagine that I have someone enter a value for the number of hours to include in a report. I enter 5 and the report divides this in half to go back 2.5 hours and forward 2.5 hours. I run this code at the top of my code block:

Click through for Steve’s example. This ultimately has to do with integer division. If you run the following code, you’ll still get 2 as the result:

SELECT CAST(5.99 / 2) AS INT;

This is because SQL Server discards the decimal during integer casting. DATEADD() simply works with the end result, post-cast.

Leave a Comment

Essential Settings for SSMS

Rich Benner has some thoughts on configuration:

By default, SSMS does not show row numbers next to your code. That’s not a huge issue when working with smaller blocks of T-SQL. However, when working with larger problems and/or collaborating with colleagues, being able to refer to a row number is invaluable. This is especially important for us here at SQL Solutions Group when we’re in meetings and looking at code. Being able to refer to a line number on somebody else’s screen makes the world so much easier and less confusing.

Line numbers being on should absolutely be a default setting in SSMS. I can’t think of any reason why you wouldn’t want line numbers on. Rich also looks at other ways you can customize the look and feel of SSMS, such as map mode for the vertical scroll bar, organizing pinned tabs, and more.

Leave a Comment

String Data Types in MySQL and PostgreSQL

Aisha Bukar compares two products:

A very common task in creating a database is to store string data. For example, words, paragraph(s) or even documents. String data types allow you to do just that and store and represent text. They handle everything from simple names and addresses to complex data.

A string is simply a sequence of characters. These characters can be letters, numbers, symbols, or even spaces. For example, “Simple Talk”, “MySQL and PostgreSQL”, “1234” are all strings. Think of each character as a building block. A string is made up of these blocks, arranged in a specific order.

As always, when dealing with different data platform technologies, the small differences are big.

Leave a Comment

Thoughts on Dark Mode Reports in Power BI

Elena Drakulevska reminds us to think of the user:

Lately, there’s been a lot of hype around dark-mode reports—especially now that dark mode is officially a thing in Power BI. It’s sleek, it’s trendy, and, let’s be honest, it looks pretty cool.

But before we all jump on the dark-mode train, let’s pause for a second. Because, like most things in design, IT DEPENDS.

Click through for a primer on accessibility. Meanwhile, my hottest design take is that dark mode is wildly overrated. Invest in proper task lighting.

Leave a Comment