Press "Enter" to skip to content

Curated SQL Posts

Page Compression on Heaps

Vlad Drumea explains why page compression might not give you quite what you expect:

I recently ran into SQL Server’s page compression being applied to a heap, and I figured I’d cover why that won’t work how some folks expect.

SQL Server’s page compression is really neat when applied on tables and indexes that are good candidates for it.
Even more so in cloud environments where storage costs can quickly add up.

Honestly, this is just a good reason to push for clustered indexes on all tables in SQL Server. I’ll call it good reason 5 out of 12.

Comments closed

Text Search in PostgreSQL

Jay Miller is looking for strings in all the wrong places:

I like to think of this like seeing a doctor. You can go to a family doctor and they can help you with most things. For specific results, it’s better to see a specialist who has a better understanding of the particular issue.

Search is the same way. At the end of the day, you get results but what you put into your search will affect what you get. That said there are some search methods that work better than others depending on your data.

Read on for several techniques that are available. I do think the headers denigrate LIKE/iLIKE a bit too much, as it works pretty well in many circumstances. But there are definitely good times to bring out the more powerful mechanisms, as this article shows.

Comments closed

OneLake Security ReadWrite Access

Kiefer Sheldon practices least privilege:

Many data teams face the same challenge: balancing the need for open collaboration with the responsibility of protecting sensitive information. As organizations grow, data often lives across multiple domains—some containing critical or confidential datasets—while partner teams may only need access to a subset of that information.

Until recently, maintaining this balance often meant trade-offs. Teams had to choose between a fragmented storage setup or overexposing data just to keep their workflows running smoothly.

Read on to see how this works.

Comments closed

Tracking Historical Changes in Microsoft Fabric

Kenneth Omorodion generates a snapshot:

In most modern businesses, by default, operational systems are managed in a way that only shows the current view of things in their data like active tickets, open incidents, active complaints, and daily sales. While this is a great way to monitor day-to-day reporting, it however tends to hide an important narrative for the business. For instance, it does not show how things have changed over time. It also does not tell a story on how previous periods compared to the current, in terms of the actual state of the data.

So, without a snapshot view implementation, there is no way to accurately view when data changes, and this may lead to a loss of the previous view forever with no way to retrieve that snapshot.

Click through to see how.

Comments closed

Using Haskell for Data Science

Jonathan Carroll has my attention:

I’ve been learning Haskell for a few years now and I am really liking a lot of the features, not least the strong typing and functional approach. I thought it was lacking some of the things I missed from R until I found the dataHaskell (www.datahaskell.org) project.

There have been several attempts recently to enhance R with some strong types, e.g.  vapour (vapour.run), typr (github.com), using {rlang}’s checks (josiahparry.com), and even discussions about implementations at the core level e.g.  in September 2025 (stat.ethz.ch) continued in November 2025 (stat.ethz.ch). While these try to bend R towards types, perhaps an all-in solution makes more sense.

In this post I’ll demonstrate some of the features and explain why I think it makes for a good (great?) data science language.

I’ve been a big fan of F# for data science work as well for similar reasons, so it was interesting to read this article on Haskell. H/T R-Bloggers.

Comments closed

IBM Acquires Confluent

Confluent has an announcement:

We are excited to announce that Confluent has entered into a definitive agreement to be acquired by IBM. After the transaction is closed (subject to customary closing conditions and regulatory approvals), together, IBM and Confluent will aim to provide a platform that unifies the world’s largest enterprises, unlocking data for cloud/microservices, accelerating time-to-value, and building the real-time data foundation required to scale AI across every organization. 

Whelp. I suppose it was bound to happen at some point, but I definitely can’t say this news pleases me.

Comments closed

Generating Shape-Bound Random Points in SQL Server

Sebastiao Pereira generates some numbers:

Random number generation is vital in computer science, supporting fields like optimization, simulation, robotics, and gaming. The quality, speed, and sometimes security of the generator can directly affect an algorithm’s correctness, performance, and competitiveness. In Python, random number generation is well-supported and widely used. In this article, we will look how to we can use SQL to do this.

Click through for several examples.

Comments closed

Renaming Unsaved Tabs in SSMS 22

Greg Low shares a tip:

Anyone who has worked with SSMS for long, will have the experience of opening a number of query windows for short-term work, not wanting to save the scripts, and then being unable to find the one they want in the drop-down list of scripts:

If you’re using the newest version of SSMS, read on to see how you can avoid that pain.

Comments closed

SQL Managed Instance Memory vs Cores

Kendra Little hits a pain point:

Microsoft recently announced that Azure SQL Managed Instance Next-gen General Purpose (GPv2) is now generally available. GPv2 brings significant storage performance improvements over GPv1, and if you’re using GPv1, you should plan to upgrade.

But GPv2 still has the same memory-to-core ratio problem that makes Managed Instance a rough deal for running SQL Server. SQL Server is engineered to use lots of memory—it’s a rare OLTP or mixed-OLTP workload that doesn’t need significant cache for reliable performance. We’ll have a look at the pricing math.

Read on for Kendra’s detailed thoughts on GPv2 versus GPv1 and also how GPv2 still has its warts.

Comments closed