Press "Enter" to skip to content

Day: July 17, 2024

Extracting the End of a String in R

Steven Sanderson just wants the conclusion:

Hey useR’s! Today, we’re going to discuss a neat trick: extracting substrings starting from the end of a string. We’ll cover how to achieve this using base R, stringr, and stringi. By the end of this post, you’ll have several tools in your R toolbox for string manipulation. Let’s get started!

Read on to see how you can do it in three separate libraries.

Comments closed

Finding Query History in Snowflake

Kevin Wilkie digs into the history:

If you’re running an audit process in SQL Server, you can do this fairly easily. But how many of us work in a place that requires this or has enough space to do this? Sadly, very few of us…

If you have Query Store running (and why wouldn’t you?) you can find a specific query you were running. But you have to have a specific something you can search for…

But can you get just a list of what you specifically ran in order? Not that I can find….

My recollection is that, if you’re using Azure SQL Database, that auditing is on by default (or you can turn it on with a toggle switch or radio buttion). For Azure Synapse Analytics dedicated SQL pools, there are a few DMVs that cover query operations. But yeah, in general, it’s something you’d need to enable first.

Comments closed

An Overview of Slowly Changing Dimensions

Reza Rad talks slowly changing dimensions:

If you want to use Power BI, Microsoft Fabric, or any other data analytics tools, one of the key concepts to understand when working with a data warehouse system is the SCD (Slowly Changing Dimension). I will do this in a series of at least two articles. The first one (this one) will be on the concept of what SCD is, its meaning, and its different types. Then, the next one will discuss how to implement SCD types (such as Type 2) using Microsoft Fabric and Power BI.

Reza focuses on SCD types 0-4 but does briefly touch on types 5-7 (of which, I’d never heard of SCD type 7).

Comments closed

Troubleshooting Transactional Replication Latency in SQL Server

Bobirmirzo Arslanov has a two-parter around troubleshooting transactional replication latency issues. Part 1 looks at architecture and some of the distribution tables:

Before you dive into solving any issue, you need to fully understand the type of environment you have as there might have been changes you are unaware of. An easy way to do that is to run script  SQLServer/Script Replication Topology at master · sqlserver-parikh/SQLServer (github.com) which gives output like below.

Part 2 digs in deeper:

If wait time is high compared to CPU time, check wait type and troubleshoot accordingly. For example, on the above example we faced MEMORY_ALLOCATION_EXT wait_type but duration is 0. So, we are not waiting.

If CPU time is higher, this means log thread is running but latency is being observed because you have high load.

Check out both posts. I’d really like to see a similar article for merge replication, as it’s a lot harder to troubleshoot.

Comments closed

Unit Testing a Database

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Script Tasks in Azure DevOps Pipelines

Justin Bird builds a script task:

There are several dedicated script tasks that can be used for different purposes, some of the tasks have shortcut tasks which require much less input and are arguably easier to read. In the examples below, I have included the required inputs needed to run the scripts but in most cases, other optional inputs are available.

This looks pretty similar to the way you’d do it in GitHub Actions workflows as well—no surprise there, considering that the same group of people who built pipelines in ADO handled Actions in GitHub.

Comments closed