Press "Enter" to skip to content

Curated SQL Posts

Automating Management of Extended Statistics in PostgreSQL

Andrei Lepikhov builds an extension:

The extended statistics tool allows you to tell Postgres that additional statistics should be collected for a particular set of table columns. Why is this necessary? – I will try to quickly explain using the example of an open power plant database. For example, the fuel type (primary_fuel) used by a power plant is implicitly associated with the country’s name.

Click through to learn more about what extended statistics are and the nature of the extension.

Comments closed

Handling SQL Agent Dates and Durations

Andy Mallon disparages some Microsoft intern’s summer of 1996 project:

SQL Agent’s schema is older than me. It handles dates, times, and durations like it’s 1980 by using integers instead of date/time data types. My buddy Aaron Bertrand talks more about Dating Responsibly so that you can have a good datetime with your own database.

I was writing a query to pull recent job failures from SQL Agent’s msdb job history, and knew that I didn’t want to deal with the wonky date/time formats. Specifically, I was querying msdb.dbo.sysjobhistory to find the Start Time, End Time, and Duration of job runs that failed. If you aren’t familiar with that table, you can look at it over in the docs.

Andy does point out the built-in function but then explains why a separate function is superior. Andy also happens to furnish that function, so check it out.

Comments closed

Comparing Pandas to Other Libraries for Data Processing

Vidyasagar Machupalli performs a comparison:

As discussed in my previous article about data architectures emphasizing emerging trends, data processing is one of the key components in the modern data architecture. This article discusses various alternatives to Pandas library for better performance in your data architecture. 

Data processing and data analysis are crucial tasks in the field of data science and data engineering. As datasets grow larger and more complex, traditional tools like pandas can struggle with performance and scalability. This has led to the development of several alternative libraries, each designed to address specific challenges in data manipulation and analysis.

This is by no means a comprehensive test, but it does show off quite a few libraries that perform similar actions to Pandas.

Comments closed

Microsoft Fabric Shortcuts and Lakehouse Maintenance

Dennes Torres has a public service announcement:

I wrote about lakehouse maintenance before, about multiple lakehouse maintenancespublished videos about this subject and provided sample code about it.

However, there is one problem: All the maintenance execution should be avoided over shortcuts.

The tables require maintenance in their original place. According to our solution advances, we start using shortcuts, lots of them. Our maintenance code should always skip shortcuts and make the maintenance only on the tables.

Click through to see how you can differentiate shortcuts from actual tables and write code to avoid shortcuts.

Comments closed

Custom Visual Dialog Boxes Broken in Power BI Desktop February 2025

Marco Russo has some bad news for us:

As one of the founders of OKVIZ—a company dedicated to producing custom visuals—I have been following the recent developments in Power BI Desktop with particular concern. This issue, however, extends beyond our company and affects many other organizations that rely on custom visuals to enhance their business intelligence solutions. For this reason, I use my blog on SQLBI to reach a larger audience.

Click through for the problem. Marco has an update that Microsoft pledges to have the problem fixed with the March release of Power BI Desktop.

Comments closed

Cleaning up Azure Container Registries

Jess Pomfret does a bit of cleanup work:

Azure Container Registries can easily become cluttered with many versions of images. Did you know that each ACR sku comes with a certain amount of storage included, and when you go over that, you’ll pay overage charges. Let’s look at how to check your current storage, keep your registry nice and tidy with an ACR clean-up task, and monitor the storage levels so you’ll never pay extra again!

It’s easy to run up the disk space usage with a container registry, especially if you have automated builds running.

Comments closed

Performance Comparison: Tally Table vs GENERATE_SERIES()

Steve Jones performs a pair of tests:

I had someone reach out about generate_series() recently, saying they hadn’t realized this was a new feature in SQL Server 2022. They were wondering if it was better than using a tally table.

I didn’t want to do an exhaustive test, but I thought I’d take a minute and try a couple simple things just to see.

Steve used the CTE-based tally table builder, building based on cross joining spt_values. This is one of the classic approaches. The performance differences aren’t enough on their own to justify large-scale changes if you’re using a classical tally table, though it is good to see that GENERATE_SERIES() does perform well. And if you’re not familiar with the power of a tally table, here is one great explanation of the concept.

Comments closed

Searching for Power Query Functions via the Shared Keyword

Reza Rad shares something with us:

As I mentioned earlier in Power BI online book, Power Query is a functional language. Knowing functions is your best helper when you work with a functional language. Fortunately Power Query both in Excel and Power BI can use shared keyword to reveal a document library of all functions. I’ve written about shared keyword almost 2.5 years ago, when it was only an add-in for excel. However I still see people in my webinars who are new with #shared keyword functionality and amazed how helpful this little keyword is. So I decided to explain it with the new Power BI. With the method in this post you can find any function you want easily in Power Query, and you won’t need an internet connection to search in functions.

Click through to see what #shared can do for you.

Comments closed

Random Functions in PostgreSQL 17

Leo Hsu and Regina Obe look at updates to the random() function in PostgreSQL:

Have you ever wanted to get a random integer between 1 and 10 and been a little annoyed the slightly cryptic code you had to write in PostgreSQL? PostgreSQL 17 random functions make that simpler. Sometimes it’s the small changes that bring the most joy.

Click through to see what it took to get a random integer or floating point number prior to PostgreSQL 17 and how it’s a fair bit simpler today.

Comments closed