Press "Enter" to skip to content

Curated SQL Posts

Comparing PL/SQL to PL/pgSQL

Umair Shahid switches dialects:

To address these limitations, database systems like Oracle and PostgreSQL offer procedural extensions to SQL. Oracle’s PL/SQL and PostgreSQL’s PL/pgSQL allow developers to implement more advanced logic, including loops, conditionals, error handling, and transaction control—all within the database. These procedural languages enhance SQL’s capabilities, making it possible to write complex routines that can execute closer to the data, thus improving performance and maintainability.

As an Oracle DBA transitioning to PostgreSQL, understanding the differences between PL/SQL and PL/pgSQL is critical. This article explores the nuances between the two languages, covering syntax, features, and practical migration tips, ensuring you can leverage PL/pgSQL effectively in your PostgreSQL environment.

Read on for several differences between the two languages.

Comments closed

Power BI Automatic Aggregations and Databricks

Katie Cummiskey, et al, do a bit of caching:

Automatic aggregations streamline the process of improving BI query performance by maintaining an in-memory cache of aggregated data. This means that a substantial portion of report queries can be served directly from this in-memory cache instead of relying on the backend data sources. Power BI automatically builds these aggregations using AI based on your query patterns and then intelligently decides which queries can be served from the in-memory cache and which are routed to the data source through DirectQuery, resulting in faster visualizations and reduced load on the backend systems.

Click through to learn more about automatic aggregations, which SKUs of Power BI / Fabric are eligible, and how you can enable it for data coming from Databricks.

Comments closed

Incremental Integrity Checks for SQL Server Databases

Eitan Blumin eats the elephant:

Traditional database integrity checks in SQL Server can be time-consuming and resource-intensive, especially for large databases, even when using super cool tools like Ola Hallengren’s maintenance solution.

To address this challenge, I developed a TSQL script for performing incremental integrity checks, which significantly optimizes the process and reduces its impact on the server.

Click through for that script and notes from Eitan.

I’ll also shill for Minion CheckDB whenever I can. I was an early beta tester for the product and it was designed specifically for dealing with large databases.

Comments closed

Filtered Indexes in PostgreSQL

Hubert Lubaczewski digs into a performance issue:

So, we have some databases on PostgreSQL 14 (yes, I know, we should upgrade, but it is LONG project to upgrade, so it’s not really on the table now).

Last week someone noticed problems with website. These were tracked to queries getting killed because of statement_timeout of 3 minutes.

The query was relatively simple (most of the identifiers were obfuscated, but tried to keep them somewhat memorable):

Click through for the story, analysis of the problem, and how creating a filtered index worked in this case. Filtered indexes are a beautiful thing when the optimizer knows how to make use of them.

Comments closed

Tips for Saving Money in the Cloud

Joey D’Antoni is speaking my language:

In the early days of cloud computing, there was a lot of talk about how the cloud was cheaper going to be cheaper than on-premises computing. Also, in the early days of cloud computing you could only get storage with like 1/1000 of the IOPs of the laptop I’m typing on right now, and the largest VM you could buy had maybe 32 GB of RAM. Things changed for the better, services got a lot better and richer, and in 2024, it’s not uncommon for your monthly cloud bill to resemble a phone number. I’ve done a lot of work with both clients, and in training to optimize cloud infrastructure to meet a better cost profile. You’ll note I didn’t inherently jump to lower prices there—the cheapest solution isn’t always best. Let’s talk about money in the cloud and how it works.

Read on for Joey’s tips. To add a couple more from my own:

  • Make use of spot instances for VMs whenever you can. Spot instances can save you a lot of money over reserved instances, although you will need to have flexibility in how you do your work because your spot instance will disappear after somebody else is willing to pay a bit more than you for that hardware.
  • Look into dev/test subscriptions, that are part of a Visual Studio subscription. Use those for non-production environments because you’ll save money on licensing Windows and SQL Server, as well as getting discounts on certain platform-as-a-service offerings like Azure Application Services.
  • Be ruthless about cleaning up technical debt. Any sort of inefficiency costs money when dealing with variable resources.
  • If you’re on Azure, go through the Well-Architected Framework review process. This can take several days to complete if you take it completely seriously, but it does an excellent job of pointing out inefficiencies that are costing you money.
Comments closed

Minimizing Latency in Kafka Streaming Applications using APIs

Abhishek Goswami doesn’t want to slow down the stream:

Kafka is widely adopted for building real-time streaming applications due to its fault tolerance, scalability, and ability to process large volumes of data. However, in general, Kafka streaming consumers work best only in an environment where they do not have to call external APIs or databases. In a situation when a Kafka consumer must make a synchronous database or API call, the latency introduced by network hops or I/O operations adds up and accumulates easily (especially when the streaming pipeline is performing an initial load of a large volume of data before starting CDC). This can significantly slow down the streaming pipeline and result in the blowing of system resources impacting the throughput of the pipeline. In extreme situations, this may even become unsustainable as Kafka consumers may not be able to commit offsets due to increased latency before the next polling call and get continuously rebalanced by the broker, practically not processing anything yet incrementally consuming more system resources as time passes.

This is a real problem faced by many streaming applications. In this article, we’ll explore some effective strategies to minimize latency in Kafka streaming applications where external API or database calls are inevitable. We’ll also compare these strategies with the alternative approach of separating out the parts of the pipeline that require these external interactions into a separate publish/subscribe-based consumer.

Read on to understand the causes of this latency and several patterns you can use to limit it.

Comments closed

Updating Microsoft Fabric Warehouses via Power Apps

Shabnam Watson troubleshoots an issue:

One of my recent explorations with Microsoft Fabric was integrating Power Apps with a Fabric Warehouse—both in a standalone Power Apps app and as an embedded visual within a Power BI report to enable writeback. My goal was simple: to enable Power Apps to display and update records from a table in the Fabric Warehouse. Initially, I turned to the three-screen template apps to get started, however; while it displayed the records, it failed to update them. This led me to dive deeper into how Fabric Warehouse differs from other SQL data sources when it comes to Power Apps and to find a workaround.

Read on to see how it all works.

Comments closed

Don’t Trust TRUSTWORTHY

Chad Callihan talks about an untrustworthy setting:

TRUSTWORTHY is a database property change that can have far-reaching security consequences when turned ON. Let’s take a brief look at what the TRUSTWORTHY property is and if it’s worth turning on, even when it is a potential fix to your problems.

Chad links to a DBA Stack Exchange post from Solomon Rutzky concerning module signing, which is a good opportunity for me to plug Solomon’s modulesigning.info. This is the correct answer, not TRUSTWORTHY or any of its ilk (EXECUTE AS, cross-database ownership chaining, etc.).

Comments closed

Glyphs and Character Encodings in PostgreSQL

Cristophe Pettis has a two-parter (so far). First up is an explanation of several linguistic terms:

This is part one of a series on PostgreSQL and collations, and how to use them without tears. This is an introduction to the general concepts of glyphs, character encodings, collations, and locales.

Part two covers character encodings in PostgreSQL:

From the point of view of the computer, a character string is just a sequence of bytes (maybe terminated by a zero byte, maybe with a length). If the only thing PostgreSQL had to do with character strings was store them and return them to the client, it could just ignore that character encodings even exist.

However, databases don’t just store character strings: they also compare them, build indexes on them, change them to upper case, do regex searches on them, and other things that mean they need know what the characters are and how to manipulate them. So, PostgreSQL needs to know what character encoding applies to the text it is storing.

Read on for a detailed explanation for PostgreSQL. A lot of this also applies to SQL Server, though there are some inevitable differences that pop up.

Comments closed

Finding a Good Cost Threshold for Parallelism

Jared Westover goes on a quest:

Given modern hardware, you might hear that the default setting of 5 for the Cost Threshold for Parallelism (CTFP) is far too low. However, people are left with a decision: Should they change it or leave it alone? If I change it and the performance gets worse, I’ll be left with egg on my face. What exactly is the benefit of increasing it, especially for smaller-cost queries?

Read on to learn more about what Cost Threshold for Parallelism is, how you can set it, and a simple example of how the setting can affect you. Jared also has some links to great resources that I highly recommend you check out.

Comments closed