Press "Enter" to skip to content

Curated SQL Posts

Generating Shape-Bound Random Points in SQL Server

Sebastiao Pereira generates some numbers:

Random number generation is vital in computer science, supporting fields like optimization, simulation, robotics, and gaming. The quality, speed, and sometimes security of the generator can directly affect an algorithm’s correctness, performance, and competitiveness. In Python, random number generation is well-supported and widely used. In this article, we will look how to we can use SQL to do this.

Click through for several examples.

Leave a Comment

SQL Managed Instance Memory vs Cores

Kendra Little hits a pain point:

Microsoft recently announced that Azure SQL Managed Instance Next-gen General Purpose (GPv2) is now generally available. GPv2 brings significant storage performance improvements over GPv1, and if you’re using GPv1, you should plan to upgrade.

But GPv2 still has the same memory-to-core ratio problem that makes Managed Instance a rough deal for running SQL Server. SQL Server is engineered to use lots of memory—it’s a rare OLTP or mixed-OLTP workload that doesn’t need significant cache for reliable performance. We’ll have a look at the pricing math.

Read on for Kendra’s detailed thoughts on GPv2 versus GPv1 and also how GPv2 still has its warts.

Leave a Comment

Accessing Excel Files from OneDrive via Power BI

Kristyna Ferris is happy:

I can’t believe it’s finally here! A way to have Excel live in OneDrive and access it from Power BI nearly live! We can officially short cut files to our OneLake from both SharePoint and OneDrive! I am super excited about this feature, and I hope you are too. This feature plus User Data Functions allows us to not only have data from Excel in our reports but keep it as fresh as needed. Imagine having budget allocations that you want to adjust right before or during a meeting. Now you can! You can edit a file in Excel and hit one button to see the new numbers in your report. In the past, we relied on 3rd party services or Power Apps licensing to accomplish this sort of experience. Now we can just use Excel, an old data friend.

Kristyna does note that this is in preview, so take it with that caveat in mind and read on to see how it all works.

Leave a Comment

UUIDv4 and UUIDv7 in PostgreSQL 18

Josef Machytka notes a change:

In the past there have been many discussions about using UUID as a primary key in PostgreSQL. For some applications, even a BIGINT column does not have sufficient range: it is a signed 8‑byte integer with range −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. Although these values look big enough, if we think about web services that collect billions or more records daily, this number becomes less impressive. Simple integer values can also cause conflicts of values in distributed system, in Data Lakehouses when combining data from multiple source databases etc.

However, the main practical problem with UUIDv4 as a primary key in PostgreSQL was not lack of range, but the complete randomness of the values. This randomness causes frequent B‑tree page splits, a highly fragmented primary key index, and therefore a lot of random disk I/O. There have already been many articles and conference talks describing this problem. What many of these resources did not do, however, was dive deep into the on‑disk structures. That’s what I wanted to explore here.

In fairness to BIGINT, if you collect 10 billion records a day, that’s still about 180 million days of activity before you need to reset the ID, if I got my math right. But read on to see what UUIDv7 does for GUIDs.

Leave a Comment

Troubleshooting a Distributed Availability Group Failure

Jordan Boich digs in:

To give some background on this topology, they have a DAG comprised of two individual AGs. Their Global Primary AG (we’ll call this AG1) has three replicas, while their Forwarder AG (we’ll call this AG2) has two replicas. The replicas in AG1 are all in the same subnet and all Azure VMs. The replicas in AG2 are all in their own same subnet and all Azure VMs.

By the time we got logged in and connected, the Global Primary Replica was online and was able to receive connections. The secondary replicas in the Global Primary AG however, were unable to communicate with the Global Primary Replica. This is problem 1. The other secondary problem is that several databases on the Forwarder Primary Replica were in crash recovery. This is problem 2. Considering problem 2 was happening in the DR environment, that was put aside for the time being.

Read on for the troubleshooting process and solution.

Leave a Comment

Parallelizing Python Code

Osheen MacOscar makes a function faster:

The way it is currently written is how any normal for loop will run, where the current iteration must finish before the next one starts. With this code we shouldn’t need to wait for the previous API call, there is no dependency or anything like that. In theory we could run all of the individual player queries at once and the function would be a lot faster.

Read on to see how.

Leave a Comment

Building a Lollypop Visual in Power BI

Valerie Junk creates a visual:

In this tutorial I want to show a fun little trick in Power BI. We are going to create a lollipop visual. And yes, I am still searching for a strong business case, but it is a very nice visual and the steps you take to build it can help in many other situations.

If you want to show trends without focusing too much on exact numbers, this visual works surprisingly well. And the best part is that you can build it with the standard line chart.

Read on to see how.

Leave a Comment

New T-SQL Functions

Tomaz Kastrun has been busy with this year’s advent of SQL Server 2025 blog posting. Catching up, Tomaz first looks at base-64 encoding and decoding:

SQL Server 2025 introduces a new T-SQL functions for BASE64 varbinary expressions. The first function returns base64-encoded text (BASE64_ENCODE() ), respectively for BASE64_DECODE().

BASE64_ENCODE converts the value of a varbinary expression into a base64-encoded varchar expression.

BASE64_DECODE converts a base64-encoded varchar expression into the corresponding varbinary expression.

I really like this, by the way. Base-64 encoding is quite useful for web transmissions, so having a place to generate that output easily is nice.

Second up is REGEXP_LIKE():

SQL Server 2025 introduces a new T-SQL functions for Regular Expressions (RegEx).

With multiple RegEx functions, the LIKE function indicates if the regular expression pattern matches the string or is in a string. The function is REGEXP_LIKE() that will do the job.

And third, we have REGEXP_SUBSTR() and REGEXP_REPLACE():

Continuing with SQL Server 2025 T-SQL functions for Regular Expressions for replace and substring functionalities.

Click through for Tomaz’s thoughts on all five of these functions.

Leave a Comment

Thoughts on “Real-Time Decisions”

Steve Jones is skeptical:

To be fair, humans might do the same thing and over-react, but mostly we become hesitant with unexpected news. That slowness can be an asset. We often need time to think and come to a decision. Lots of our decisions aren’t always based on hard facts, and a lot of business isn’t necessarily fact driven either. We often put our thumb on the scales when making decisions because there isn’t a clear path based on just data.

Steve’s thrust is about AI but I want to riff on “real-time” in general. First, my standard rant: “real-time” has a specific meaning that people have abused over the years. Fighter pilots need real-time systems. The rest of it is “online.” For a hint as to the difference: if you’re okay waiting 100ms for a response due to network delays or whatever else, that’s not real-time.

Standard rant aside, “I need to see real-time data” is a common demand for data warehousing projects. I worked on a warehouse once where the business wanted up-to-the-minute data but our incoming data sources for cost and revenue information refreshed once a day per customer and intraday information was sketchy enough that it didn’t make sense to store in a warehouse. But when you probe people on how often they’ll look at the data, it turns out that hourly or daily loads make more sense based on the review cadence.

The question to ask is, how big is your OODA loop and is additional information really the limiting factor? Sometimes that answer can be yes, but generally there are other factors preventing action.

Leave a Comment