Press "Enter" to skip to content

Curated SQL Posts

Solid Practices for Power BI

Paul Turley has the beginnings of a new series:

It’s time for a refresher and reboot on this important topic. Much has changed in the Power BI world, the core design principles remain the same, practices and architecture patterns have evolved over the past few years. Power BI has grown up in the enterprise space and Microsoft Fabric now adds new options and capabilities. Back in 2020, I began writing a series of blog posts titled “Doing Power BI the Right Way” and it has become my mission to evolve and maintain a current collection of the most important best practice recommendations. This has been my passion and topic of several conference talks, user group sessions and a book currently in development for O’Reilly that will help you prepare for the PL-300 Power BI Analyst exam and then guide you apply enterprise best practices in your solutions.

I work with hundreds of consulting clients who go through the same cycles, having the same experiences, facing the same challenges, many making the same mistakes, and many learning some of the same lessons. The purpose of this series is to share those lessons with you.

Click through for the overview, as well as an outline of what Paul will include in this series.

Comments closed

Viewing Storage Consumption in Microsoft Fabric

Gilbert Quevauvilliers wants to know about storage utilization in Microsoft Fabric:

This blog post will show you how to understand what is consuming your Fabric Storage.

If you want to know how I got this data, please read my previous blog post View all your Storage consumed in Microsoft Fabric – Lakehouse Files, Tables and Warehouses – FourMoo

With this Semantic model below, I could also create alerts to notify based on certain thresholds. For example, if total storage in a single App workspace is more than 100GB send me an alert (This could be done using Power Automate). Or it could be on too many files being stored, or even looking at the Parquet file sizes and if they are too small they would then need to be optimised (for better performance).

Click through for the report.

Comments closed

Security Risk Profile in AI-Generated Code

Jerome Robert reviews the papers:

As such, nowadays, almost all developers use some form of AI-generated code — and they absolutely should. AI tools make developers’ lives easier by leveraging the knowledge cultivated by the development community over time and across the globe to overcome obstacles that, while potentially new and challenging to them, have long been addressed. They can reasonably trust that code to perform the function they want to achieve — and can test it to be sure.

But can they trust that code to be secure? Absolutely not. With all that time and work spent committing functional code, just as much, if not more, is spent navigating the security backlog afterward.

Click through for a summary of two recent academic papers, as well as links to the papers themselves.

Comments closed

An Intro to Power Apps

Vikash Kumar has a two-parter. First up is an introduction to Power Apps:

Microsoft PowerApps is a tool within the Microsoft Power Platform suite that enables users to create custom apps without extensive coding knowledge. It’s a low-code platform that empowers both technical and non-technical users to build apps quickly. Whether you’re looking to automate business processes or solve specific problems, PowerApps provides the tools you need.

Next is a primer on Canvas Apps in Power Apps:

Canvas Apps are applications where you start with a blank canvas and design the user interface from scratch. You have complete control over the layout, appearance, and functionality of the app by dragging and dropping components onto the canvas. This makes Canvas Apps ideal for creating custom apps that match your specific design and workflow requirements.

Power Apps is something I’ve never really gotten into. I don’t think it’s extremely intuitive, which can be a drawback in a tool whose primary audience is non-developers.

Comments closed

Comparing PL/SQL to PL/pgSQL

Umair Shahid switches dialects:

To address these limitations, database systems like Oracle and PostgreSQL offer procedural extensions to SQL. Oracle’s PL/SQL and PostgreSQL’s PL/pgSQL allow developers to implement more advanced logic, including loops, conditionals, error handling, and transaction control—all within the database. These procedural languages enhance SQL’s capabilities, making it possible to write complex routines that can execute closer to the data, thus improving performance and maintainability.

As an Oracle DBA transitioning to PostgreSQL, understanding the differences between PL/SQL and PL/pgSQL is critical. This article explores the nuances between the two languages, covering syntax, features, and practical migration tips, ensuring you can leverage PL/pgSQL effectively in your PostgreSQL environment.

Read on for several differences between the two languages.

Comments closed

Power BI Automatic Aggregations and Databricks

Katie Cummiskey, et al, do a bit of caching:

Automatic aggregations streamline the process of improving BI query performance by maintaining an in-memory cache of aggregated data. This means that a substantial portion of report queries can be served directly from this in-memory cache instead of relying on the backend data sources. Power BI automatically builds these aggregations using AI based on your query patterns and then intelligently decides which queries can be served from the in-memory cache and which are routed to the data source through DirectQuery, resulting in faster visualizations and reduced load on the backend systems.

Click through to learn more about automatic aggregations, which SKUs of Power BI / Fabric are eligible, and how you can enable it for data coming from Databricks.

Comments closed

Incremental Integrity Checks for SQL Server Databases

Eitan Blumin eats the elephant:

Traditional database integrity checks in SQL Server can be time-consuming and resource-intensive, especially for large databases, even when using super cool tools like Ola Hallengren’s maintenance solution.

To address this challenge, I developed a TSQL script for performing incremental integrity checks, which significantly optimizes the process and reduces its impact on the server.

Click through for that script and notes from Eitan.

I’ll also shill for Minion CheckDB whenever I can. I was an early beta tester for the product and it was designed specifically for dealing with large databases.

Comments closed

Filtered Indexes in PostgreSQL

Hubert Lubaczewski digs into a performance issue:

So, we have some databases on PostgreSQL 14 (yes, I know, we should upgrade, but it is LONG project to upgrade, so it’s not really on the table now).

Last week someone noticed problems with website. These were tracked to queries getting killed because of statement_timeout of 3 minutes.

The query was relatively simple (most of the identifiers were obfuscated, but tried to keep them somewhat memorable):

Click through for the story, analysis of the problem, and how creating a filtered index worked in this case. Filtered indexes are a beautiful thing when the optimizer knows how to make use of them.

Comments closed

Tips for Saving Money in the Cloud

Joey D’Antoni is speaking my language:

In the early days of cloud computing, there was a lot of talk about how the cloud was cheaper going to be cheaper than on-premises computing. Also, in the early days of cloud computing you could only get storage with like 1/1000 of the IOPs of the laptop I’m typing on right now, and the largest VM you could buy had maybe 32 GB of RAM. Things changed for the better, services got a lot better and richer, and in 2024, it’s not uncommon for your monthly cloud bill to resemble a phone number. I’ve done a lot of work with both clients, and in training to optimize cloud infrastructure to meet a better cost profile. You’ll note I didn’t inherently jump to lower prices there—the cheapest solution isn’t always best. Let’s talk about money in the cloud and how it works.

Read on for Joey’s tips. To add a couple more from my own:

  • Make use of spot instances for VMs whenever you can. Spot instances can save you a lot of money over reserved instances, although you will need to have flexibility in how you do your work because your spot instance will disappear after somebody else is willing to pay a bit more than you for that hardware.
  • Look into dev/test subscriptions, that are part of a Visual Studio subscription. Use those for non-production environments because you’ll save money on licensing Windows and SQL Server, as well as getting discounts on certain platform-as-a-service offerings like Azure Application Services.
  • Be ruthless about cleaning up technical debt. Any sort of inefficiency costs money when dealing with variable resources.
  • If you’re on Azure, go through the Well-Architected Framework review process. This can take several days to complete if you take it completely seriously, but it does an excellent job of pointing out inefficiencies that are costing you money.
Comments closed

Minimizing Latency in Kafka Streaming Applications using APIs

Abhishek Goswami doesn’t want to slow down the stream:

Kafka is widely adopted for building real-time streaming applications due to its fault tolerance, scalability, and ability to process large volumes of data. However, in general, Kafka streaming consumers work best only in an environment where they do not have to call external APIs or databases. In a situation when a Kafka consumer must make a synchronous database or API call, the latency introduced by network hops or I/O operations adds up and accumulates easily (especially when the streaming pipeline is performing an initial load of a large volume of data before starting CDC). This can significantly slow down the streaming pipeline and result in the blowing of system resources impacting the throughput of the pipeline. In extreme situations, this may even become unsustainable as Kafka consumers may not be able to commit offsets due to increased latency before the next polling call and get continuously rebalanced by the broker, practically not processing anything yet incrementally consuming more system resources as time passes.

This is a real problem faced by many streaming applications. In this article, we’ll explore some effective strategies to minimize latency in Kafka streaming applications where external API or database calls are inevitable. We’ll also compare these strategies with the alternative approach of separating out the parts of the pipeline that require these external interactions into a separate publish/subscribe-based consumer.

Read on to understand the causes of this latency and several patterns you can use to limit it.

Comments closed