Press "Enter" to skip to content

Curated SQL Posts

Diagnosing Slow Commits in Postgres

Laurenz Albe offers up some thoughts:

Sometimes one of our customers looks at the most time consuming statements in a database (either with pg_stat_statements or with pgBadger) and finds COMMIT in the high ranks. Normally, COMMIT is a very fast statement in PostgreSQL, so that is worth investigating. In this article, I will explore the possible reasons for a slow COMMIT and discuss what you can do about it.

Read on for those reasons.

Comments closed

Azure SQL Managed Instanced Update Policies

Rod Edwards is not amused:

Ah, SQL 2022, the release that finally brought box SQL and Azure managed instance closer together. We have wonderful toys such as Managed Instance Link, allowing us to connect our SQL 2022 on prem instances with Managed Instance Link. It waslike the first real effort to integrate modern Azure offerings with those who also need / prefer an On Prem presence.

Rob Litjens has a follow-up on this:

I prepared some questions:

  1. What polices does Managed Instance have?
  2. Why did Microsoft implement the ‘Always-up-to-date update policy’ Policy?
  3. Why is it named Policy?
  4. Do we need to update our Azure scripts to implement it (immediately)?
  5. Is there impact on offerings like Managed Instance Link

Do read both of these as they combine for a rounded perspective of the issue Rod brought up.

Comments closed

Recapping an Orchestration Framework

Martin Schoombee wraps up a series:

Frameworks are extremely useful when they are thoughtfully designed and implemented. I have seen both sides of the coin, but what I probably see the most of is a lack of any sort of framework. What I typically see are some naming conventions and coding standards, but many companies miss the opportunity to take it one step further and reduce the inefficiencies of repetitive tasks. There’s a ton of repetition in ETL processes, and in my opinion that gives us a really good opportunity to streamline the way in which we are doing things with a well designed framework.

Read on for Martin’s notes to keep in mind, as well as where to go from here.

Comments closed

Power BI Studio 2.0 Released

Gerhard Brueckl moves fast and doesn’t break things:

Due to the great feedback I have received for the first version of my VSCode extension to mange Power BI objects from within VSCode I decided to continue working on it and am finally happy to share that I am releasing a new version – v2.0!

If you already had the previous version installed in VSCode, you do not have to do anything as it will update automatically. If you are a new user, you can install it from the gallery or search for “Power BI Studio” in the VSCode extensions tab.

Click through for the list of updates, as well as how you can install the Visual Studio Code extension.

Comments closed

Looping through Data in Microsoft Fabric PySpark Notebooks

Gilbert Quevauvilliers builds a loop:

Continuing with my existing blog series on what I’m learning with notebooks and PySpark.

Today, I’m going to explain to you how I found a way to loop through data in a notebook.

In this example, I’m going to show you how I loop through a range of dates, which can then be used in a subsequent query to extract data by passing through each date into a DAX query.

Click through for Gilbert’s example. Here’s an alternative using something called a list comprehension. First, build a function that does what you want to do—that’d be the innards of Gilbert’s Python code, lines 31-54.

def perform_dax_query(row):
    var_Date = row["Date"]
    ...
    display(df_DAX_QueryResult)

Then, call that function for each row:

[perform_dax_query(row) for row in data_collect]

In this particular scenario, I’d personally stick with Gilbert’s composition, but in cases where you’re transforming a list of elements into a new list—for example, if you’re performing some data cleanup for each row in a list and you want the output to be a new list with cleaned-up data—then the list comprehension works really well.

Comments closed

Dynamic Historical Partition Refresh in Power BI

Marc Lelijveld digs into partition refreshing:

I’ve heard the question pretty often from customers: “You told me to use incremental refresh, but how can I regularly run a full load or refresh onder partitions?” Well, there are perfect ways to do this using Tabular Editor or SQL Server Management Studio. But this often includes manual work to trigger the processing.

Today, this question was asked again to me. I thought, there should be a smarter way to do this. Since I recently explored more in the wonderful world of Fabric Notebooks and Python, decided to dive a bit deeper in this world and see if it is possible to script something like this using Semantic Link. And obviously, the answer is “Yes!”

Read on to learn how to do it with a bit of Python and Microsoft Fabric’s Semantic Link library (sempy).

Comments closed

An Introduction to the Native Execution Engine in Microsoft Fabric

Sandeep Pawar gives us the gentle version:

At MS Build, Microsoft announced availability of Native Execution Engine in Fabric. The product team will have detailed documentation and technical details but I will attempt to provide an ELI5 version of what it means and how it works at a 30000 ft level.

Read on to learn what the Native Execution Engine is and why Microsoft would use it versus running everything in the Java Virtual Machine like other services in the broader Hadoop ecosystem have historically done (at least at the beginning, before they built their own native execution engines!).

Comments closed

Cross-Correlation of Time Series to Identify Time Lags in SAS

Kevin Scott and David Frede notice the pattern:

Batch manufacturing involves producing goods in batches rather than in a continuous stream. This approach is common in industries such as pharmaceuticals, chemicals, and materials processing, where precise control over the production process is essential to ensure product quality and consistency. One critical aspect of batch manufacturing is the need to manage and understand inherent time delays that occur at various stages of the process.

In the glass manufacturing industry, which operates under the principles of batch manufacturing, precisely controlling the furnace temperature is essential for producing high-quality glass. The process involves melting raw materials like silica sand, soda ash, and limestone at high temperatures, where maintaining the correct temperature is crucial.

Read on to see an example of how you can automate the identification of a time lag using cross-correlation techniques.

Comments closed

Handling Imbalanced Data in Classification Algorithms

Matthew Mayo shares a few tips:

Imperfect data is the norm rather than the exception in machine learning. Comparably common is the binary class imbalance when the classes in a trained data remains majority/minority class, or is moderately skewed. Imbalanced data can undermine a machine learning model by producing model selection biases. Therefore in the interest of model performance and equitable representation, solving the problem of imbalanced data during training and evaluation is paramount.

This article will define imbalanced data, resampling strategies as solution, appropriate evaluation metrics, kinds of algorithmic approaches, and the utility of synthetic data and data augmentation to address this imbalance.

Read on for five recommendations, starting with what you should know and then offering up four options for what you can do.

Comments closed