Press "Enter" to skip to content

Author: Kevin Feasel

Extracting the End of a String in R

Steven Sanderson just wants the conclusion:

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Read on to see how you can do it in three separate libraries.

Comments closed

An Overview of Slowly Changing Dimensions

Reza Rad talks slowly changing dimensions:

If you want to use Power BI, Microsoft Fabric, or any other data analytics tools, one of the key concepts to understand when working with a data warehouse system is the SCD (Slowly Changing Dimension). I will do this in a series of at least two articles. The first one (this one) will be on the concept of what SCD is, its meaning, and its different types. Then, the next one will discuss how to implement SCD types (such as Type 2) using Microsoft Fabric and Power BI.

Reza focuses on SCD types 0-4 but does briefly touch on types 5-7 (of which, I’d never heard of SCD type 7).

Comments closed

Finding Query History in Snowflake

Kevin Wilkie digs into the history:

If you’re running an audit process in SQL Server, you can do this fairly easily. But how many of us work in a place that requires this or has enough space to do this? Sadly, very few of us…

If you have Query Store running (and why wouldn’t you?) you can find a specific query you were running. But you have to have a specific something you can search for…

But can you get just a list of what you specifically ran in order? Not that I can find….

My recollection is that, if you’re using Azure SQL Database, that auditing is on by default (or you can turn it on with a toggle switch or radio buttion). For Azure Synapse Analytics dedicated SQL pools, there are a few DMVs that cover query operations. But yeah, in general, it’s something you’d need to enable first.

Comments closed

Troubleshooting Transactional Replication Latency in SQL Server

Bobirmirzo Arslanov has a two-parter around troubleshooting transactional replication latency issues. Part 1 looks at architecture and some of the distribution tables:

Before you dive into solving any issue, you need to fully understand the type of environment you have as there might have been changes you are unaware of. An easy way to do that is to run script  SQLServer/Script Replication Topology at master · sqlserver-parikh/SQLServer (github.com) which gives output like below.

Part 2 digs in deeper:

If wait time is high compared to CPU time, check wait type and troubleshoot accordingly. For example, on the above example we faced MEMORY_ALLOCATION_EXT wait_type but duration is 0. So, we are not waiting.

If CPU time is higher, this means log thread is running but latency is being observed because you have high load.

Check out both posts. I’d really like to see a similar article for merge replication, as it’s a lot harder to troubleshoot.

Comments closed

Unit Testing a Database

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Script Tasks in Azure DevOps Pipelines

Justin Bird builds a script task:

There are several dedicated script tasks that can be used for different purposes, some of the tasks have shortcut tasks which require much less input and are arguably easier to read. In the examples below, I have included the required inputs needed to run the scripts but in most cases, other optional inputs are available.

This looks pretty similar to the way you’d do it in GitHub Actions workflows as well—no surprise there, considering that the same group of people who built pipelines in ADO handled Actions in GitHub.

Comments closed

One Problem with Scrollbars in Power BI

Chris Webb focuses on the performance aspect of scrollbars on tables:

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

I’d also note the aesthetic problem: the intent of a dashboard is to be glanceable, meaning that a user can gain sufficient understanding of what is happening without needing to click, drag, sort, filter, or otherwise manipulate the dashboard. This means, if you are intending to create a dashboard (versus a report), vertical scrollbars form a second sort of issue: you’re obscuring data that you consider important enough to show to the end user.

Comments closed

Tips to Identify Poorly-Performing Code

Rich Benner shares a few tricks:

So, you have a database or block of code. You’ve been told to “make it fast” but you’re not sure where to start. I’ve got you. We’re going to create a process to follow to ensure we can tune effectively and prove that we’ve made things faster. It’s all about having a structure when performance tuning.

Read on for Rich’s recommendations on how to identify poorly-performing code.

Comments closed

Variable Evaluation Time in DAX

Marco Russo and Alberto Ferrari evaluate some variables:

Variables are an important element of DAX to improve readability and performance. Variables are created through the VAR keyword, which can be repeated multiple times for multiple variables, followed by RETURN, the keyword that defines the result of the expression.

Read on for several examples of how you can use variables, and even if it’s possible for the engine to ignore variable declarations if the variable in question never sees use.

Comments closed