Press "Enter" to skip to content

Curated SQL Posts

String Parsing in T-SQL

Rob Farley shares some thoughts:

But let’s talk about non-regex methods for parsing strings and the patterns that I use. I find that the biggest issue with most string parsing is complexity. Even something as simple as finding the value between the 2nd and 3rd hyphens can be done in different ways with different levels of complexity, and even if it works, maintaining that code can become really hard.

For example, finding the position of the first hyphen might be as simple as using the CHARINDEX function. Finding the second might involve two CHARINDEX functions, and calling SUBSTRING with parameters that have increasingly nested CHARINDEX calls… well, you can see how the complexity quickly builds

Rob digs into one of my favorite use cases for the APPLY operator: simplifying calculations, or in this case simplifying expression chains. Granted, I have also grown to appreciate the DuckDB solution of allowing for function chaining. The demo examples in that documentation are limited but you can do things like goose_name.lower().replace('goose', 'duck').replace(' ', '') and it will work fine.

Leave a Comment

Regular Expressions in Power BI TMDL View Find and Replace

Jon Vöge performs a search:

For this weeks blog, a quick tip about a feature in Power BI desktop which had flow entirely over my head: You can use RegEx for Find & Replace operations in Power BI Desktop TMDL View!

Yes! You heard that right!

I had no idea, until I caught it in a live demo by Power BI partner director Mohammad Ali at his Power BI Next Step keynote.

Read on to see what you can do with this. The same is possible in other tools like Visual Studio Code and even SQL Server Management Studio, though what specific regular expression capabilities are available and the exact syntax for them will differ based on the product.

Leave a Comment

Checking Direct Lake Model Sources

Nikola Ilic wants to know if Direct Lake is using OneLake or SQL:

In my recent Microsoft Fabric training, I’ve been explaining the difference between the Direct Lake on OneLake and Direct Lake on SQL, as two flavors of Direct Lake semantic models. If you are not sure what I’m talking about, please start by reading this article. The purpose of this post is not to examine the differences between these two versions, but rather to clarify some nuances that might occur. One of the questions I got from participants in the training was:

“How do we KNOW if the Direct Lake semantic model is created as a Direct Lake on OneLake or Direct Lake on SQL model?”

Read on for that answer.

Leave a Comment

Linux Huge Pages and PostgreSQL

Umair Shahid explains the value of huge pages when running PostgreSQL:

Huge pages are a Linux kernel feature that allocates larger memory pages (typically 2 MB or 1 GB instead of the normal 4 KB). PostgreSQL’s shared buffer pool and dynamic shared memory segments are often tens of gigabytes, and using huge pages reduces the number of pages the processor must manage. Fewer page‑table entries mean fewer translation‑lookaside‑buffer (TLB) misses and fewer page table walks, which reduces CPU overhead and improves query throughput and parallel query performance. The PostgreSQL documentation notes that huge pages “reduce overhead … resulting in smaller page tables and less CPU time spent on memory management”

One thing I found interesting here was the advice for PostgreSQL is to disable Transparent Huge Pages whereas in SQL Server on Linux, Microsoft’s recommendation is to keep THP enabled.

Leave a Comment

Job-Level Bursting in Microsoft Fabric Spark Jobs

Santhosh Kumar Ravindran announces a new feature:

  • Enabled (Default): When enabled, a single Spark job can leverage the full burst limit, consuming up to 3× CUs. This is ideal for demanding ETL processes or large analytical tasks that benefit from maximum immediate compute power.
  • Disabled: If you disable this switch, individual Spark jobs will be capped at the base capacity allocation. This prevents a single job from monopolizing the burst capacity, thereby preserving concurrency and improving the experience for multi-user, interactive scenarios.

Read on for the list of caveats and the note that it will cost extra money to flip that switch.

Leave a Comment

SSMS Query Hint Recommendation Tool

Brent Ozar tries out a new feature of SQL Server Management Studio:

The maximum tuning time defaults to 300 seconds, but I tacked on a couple zeroes because my slow query already took ~20 seconds to run on its own, and I wanted to give the wizard time to wave his little wand around. The tool actually runs your query repeatedly with different hints, so if you have a 5-minute query, you’ll need to give the tool more time.

Click Start, and it begins running your query with different hints. A couple minutes later, I got:

Brent’s review is quite positive, in a “This is way better than the alternative of doing nothing” sense.

Leave a Comment

Monitoring Microsoft Fabric Costs

Chris Webb uses a report:

Following on from my blog post a few months ago about cool stuff in the Fabric Toolbox, there is now another really useful solution available there that anyone with Fabric capacities should check out: Fabric Cost Analysis (or FCA). If you have Fabric capacities it’s important to be able to monitor your Azure costs relating to them, so why not monitor your Fabric costs using a solution built using Fabric itself? This is what the folks behind FCA (who include Romain Casteres, author of this very useful blog post on FinOps for Fabric, plus Cédric Dupui, Manel Omani and Antoine Richet) decided to build and share freely with the community.

Click through to see how it works, and check out the FCA link in the graf above to get the code.

Leave a Comment

The Downside of Zero-Copy Integration between Kafka and Iceberg

Jack Vanlightly lays out an argument:

Over the past few months, I’ve seen a growing number of posts on social media promoting the idea of a “zero-copy” integration between Apache Kafka and Apache Iceberg. The idea is that Kafka topics could live directly as Iceberg tables. On the surface it sounds efficient: one copy of the data, unified access for both streaming and analytics. But from a systems point of view, I think this is the wrong direction for the Apache Kafka project. In this post, I’ll explain why. 

Read on for an explanation of what “zero-copy” means here, as well as Jack’s position on the matter. I think it’s a solid argument and worth the read.

Comments closed