Press "Enter" to skip to content

Day: May 29, 2024

healthR.data Package Updates

Steven Sanderson has an update for us:

I’m excited to share the latest updates to the healthyR.data R package! This release brings new functionality and minor improvements, all aimed at making your data management tasks easier and more efficient. Here’s a breakdown of what’s new:

Read on for information on four new functions and a couple of bugfixes.

Comments closed

Diagnosing Slow Commits in Postgres

Laurenz Albe offers up some thoughts:

Sometimes one of our customers looks at the most time consuming statements in a database (either with pg_stat_statements or with pgBadger) and finds COMMIT in the high ranks. Normally, COMMIT is a very fast statement in PostgreSQL, so that is worth investigating. In this article, I will explore the possible reasons for a slow COMMIT and discuss what you can do about it.

Read on for those reasons.

Comments closed

Azure SQL Managed Instanced Update Policies

Rod Edwards is not amused:

Ah, SQL 2022, the release that finally brought box SQL and Azure managed instance closer together. We have wonderful toys such as Managed Instance Link, allowing us to connect our SQL 2022 on prem instances with Managed Instance Link. It waslike the first real effort to integrate modern Azure offerings with those who also need / prefer an On Prem presence.

Rob Litjens has a follow-up on this:

I prepared some questions:

  1. What polices does Managed Instance have?
  2. Why did Microsoft implement the ‘Always-up-to-date update policy’ Policy?
  3. Why is it named Policy?
  4. Do we need to update our Azure scripts to implement it (immediately)?
  5. Is there impact on offerings like Managed Instance Link

Do read both of these as they combine for a rounded perspective of the issue Rod brought up.

Comments closed

Recapping an Orchestration Framework

Martin Schoombee wraps up a series:

Frameworks are extremely useful when they are thoughtfully designed and implemented. I have seen both sides of the coin, but what I probably see the most of is a lack of any sort of framework. What I typically see are some naming conventions and coding standards, but many companies miss the opportunity to take it one step further and reduce the inefficiencies of repetitive tasks. There’s a ton of repetition in ETL processes, and in my opinion that gives us a really good opportunity to streamline the way in which we are doing things with a well designed framework.

Read on for Martin’s notes to keep in mind, as well as where to go from here.

Comments closed

Power BI Studio 2.0 Released

Gerhard Brueckl moves fast and doesn’t break things:

Due to the great feedback I have received for the first version of my VSCode extension to mange Power BI objects from within VSCode I decided to continue working on it and am finally happy to share that I am releasing a new version – v2.0!

If you already had the previous version installed in VSCode, you do not have to do anything as it will update automatically. If you are a new user, you can install it from the gallery or search for “Power BI Studio” in the VSCode extensions tab.

Click through for the list of updates, as well as how you can install the Visual Studio Code extension.

Comments closed

Looping through Data in Microsoft Fabric PySpark Notebooks

Gilbert Quevauvilliers builds a loop:

Continuing with my existing blog series on what I’m learning with notebooks and PySpark.

Today, I’m going to explain to you how I found a way to loop through data in a notebook.

In this example, I’m going to show you how I loop through a range of dates, which can then be used in a subsequent query to extract data by passing through each date into a DAX query.

Click through for Gilbert’s example. Here’s an alternative using something called a list comprehension. First, build a function that does what you want to do—that’d be the innards of Gilbert’s Python code, lines 31-54.

def perform_dax_query(row):
    var_Date = row["Date"]
    ...
    display(df_DAX_QueryResult)

Then, call that function for each row:

[perform_dax_query(row) for row in data_collect]

In this particular scenario, I’d personally stick with Gilbert’s composition, but in cases where you’re transforming a list of elements into a new list—for example, if you’re performing some data cleanup for each row in a list and you want the output to be a new list with cleaned-up data—then the list comprehension works really well.

Comments closed

Dynamic Historical Partition Refresh in Power BI

Marc Lelijveld digs into partition refreshing:

I’ve heard the question pretty often from customers: “You told me to use incremental refresh, but how can I regularly run a full load or refresh onder partitions?” Well, there are perfect ways to do this using Tabular Editor or SQL Server Management Studio. But this often includes manual work to trigger the processing.

Today, this question was asked again to me. I thought, there should be a smarter way to do this. Since I recently explored more in the wonderful world of Fabric Notebooks and Python, decided to dive a bit deeper in this world and see if it is possible to script something like this using Semantic Link. And obviously, the answer is “Yes!”

Read on to learn how to do it with a bit of Python and Microsoft Fabric’s Semantic Link library (sempy).

Comments closed