Press "Enter" to skip to content

Day: August 20, 2024

Loops in R

Ben Johnston spins in circles:

Welcome back to my R for SEO series. We’re in the home stretch now, with part seven. Today, we’re going to be looking at different ways that we can run functions or commands over a series of elements using the various kinds of loops that exist in R.

If you’ve followed along so far, or you’ve tried some experimentation of your own, you’ve probably encountered loops and applys along the way. I know early on in my R journey, it very much seemed like pot luck as to which apply I should use, or whether a loop was easier, so hopefully today’s piece will start to clear that up for you a little.

I know that most programming courses cover these elements earlier, but for me, it really didn’t click until I’d learned more about the other areas we’ve covered in this series, so that’s why I’ve placed it here.

Read on for examples of For loops and While loops, as well as breaking conditions.

Ben also talks about loops versus using the apply() series of functions (or equivalent map() functions in the purrr library). I tend to lean heavily on using the mapping function approach when there are no side effects, and use for loops when there are. H/T R-Bloggers.

Comments closed

Searching for Multiple Patterns in R with grepl

Steven Sanderson looks for the pattern:

Hello, fellow useRs! Today, we’re going to expand on previous uses of the grepl() function where we looked for a single pattern and move onto to a search for multiple patterns within strings. Whether you’re cleaning data, conducting text analysis, grepl can be your go-to tool. Let’s break down the syntax, offer a practical example, and guide you on a path to proficiency.

Read on for all of that.

Comments closed

Query Re-Optimization in Postgres

Andrei Lepikhov walks through an interesting scenario:

What was the impetus to begin this work? It was caused by many real cases that may be demonstrated clearly by the Join Order Benchmark. How much performance do you think Postgres loses if you change its preference of employing parallel workers from one to zero? Two times regression? What about 10 or 100 times slower?

The black line in the graph below shows the change in execution time of each query between two cases: with parallel workers disabled and with a single parallel worker per gather allowed. For details, see the test script and EXPLAINs, with and without parallel workers.

Click through for an overview of what Andrei wrote, including architectural notes. But stick around until the end to see just how difficult the challenge is to re-optimize without making performance worse in the end.

Comments closed

PowerShell Script to Move Azure SQL DB from General Purpose to Business Critical

Sakshi Gupta shares a script:

Recently, we faced a requirement to upgrade large number of Azure SQL databases from general-purpose to business-critical.

As you’re aware, this scaling-up operation can be executed via PowerShell, CLI, or the Azure portal and follow the guidance mentioned here – Failover groups overview & best practices – Azure SQL Managed Instance | Microsoft Learn

Given the need to perform this task across a large number of databases, individually running commands for each server is not practical. Hence, I have created a PowerShell script to facilitate such extensive migrations.

Click through for the scenarios Sakshi tested, some important considerations, and the script itself.

Comments closed

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

Finding Columns in Memory in Power BI Direct Lake Mode

Chris Webb goes searching:

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Read on for the answer.

Comments closed

Troubleshooting Non-Editable Power Query Parameters in Microsoft Fabric

Soheil Bakhshi digs into a problem:

Power Query is a powerful tool within the Microsoft Fabric environment, enabling users to manage data sources and transform data efficiently. However, a common issue you may face is that after publishing the Semantic Model, the Power Query parameters either do not appear or are greyed out, making them non-editable. In this post and its accompanying YouTube video, I’ll walk you through the steps to diagnose and fix these problems, ensuring that your parameters work as expected in your published semantic models.

Click through for the video and a pair of common reasons.

Comments closed

Auditing a SQL Server Database Configuration

Ben Johnston continues a series on auditing:

This continues my series on auditing SQL Server. The fist parts covered discovery and documentation, server level hardware audits and SQL Server engine level audits. This section examines database configuration audits. As with the previous audit sections, the boundaries for the audit can be blurry. I try to stick to configuration items only, but I also discuss some code smells and items that can impact performance or might be covered in a code review. You will need to determine the scope of your audits and how much you want to cover in this portion of the audit.

This follows the patterns of the previous audits, starting with a list of items to validate, followed by key points to examine for each of those items, and ends with scripts or tactics to gather the actual audit results. As with previous audits, there are multiple methods to examine each item, but I generally prefer scripts due to their repeatability, especially when they need to be run by another team.

Read on for a general template, followed by details on each section.

Comments closed