Press "Enter" to skip to content

Curated SQL Posts

Decoding URIs in Power Query

Imke Feldmann decodes a URI:

When working with cloud data sources you might encounter links where spaces show up as “%20” instead for example. This is called URI escaping or encoding. In Power Query there is a native function that does this kind of encoding: Uri.EscapeDataString – PowerQuery M | Microsoft Learn
However, a native function that does just the opposite is missing. Fortunately, this can be achieved with a custom function that I have found in the Power BI forum:

Click through for that function.

Comments closed

Secrets Management in Docker Swarm

Andrew Pruski has a secret:

In a previous post I went through how to run SQL Server in Docker Swarm and in this post I want to continue delving into the Docker Swarm platform by looking at secrets.

Like Kubernetes, Docker Swarm has functionality to store sensitive data in secrets. When running SQL Server in containers we usually have to manually type out the environment variable for the MSSQL_SA_PASSWORD which isn’t great if we want to store that code in a public repository so being able to obfuscate that password is great!

Read on for a failed first attempt, followed by analysis and then success.

Comments closed

Set Intersection of Vectors in R

Steven Sanderson performs set intersection of vectors:

Welcome to another exciting blog post where we delve into the world of R programming. Today, we’ll be discussing the intersect() function, a handy tool that helps us find the common elements shared between two or more vectors in R. Whether you’re a seasoned R programmer or just starting your journey, this function is sure to become a valuable addition to your toolkit.

Click through to see how the function works.

Comments closed

Cumulative Sums in KQL

Robert Cain continues a series on KQL:

Seeing cumulative totals on a row by row basis is a common need. Think of your bank account, when you look at your ledger it shows each individual transaction. Beside each individual transaction is the current amount in your account, as offset by the amount of the current transaction.

This technique is known as cumulative sums. The Kusto Query Language provides the ability to create cumulative sums using the Windowing Function, row_cumsum. In this post we’ll see how to implement it.

Read on to see how it works and what the results look like.

Comments closed

Comparing Non-Standard Time Periods in Power BI

Marco Russo and Alberto Ferrari are back in school:

A requirement to apply the following technique is that every day can belong to a term or not, but there are no overlaps between terms. Indeed, if we had overlapping periods, we should create a different solution based on the Comparing different time periods pattern. In the case of school terms, we consider the case of three terms per year, where the first term starts in September of one year, and the last term ends in July of the following year. Therefore, the academic year is identified by two consecutive numbers, such as 2016-2017 (often shortened to 2016-17).

The business requirement is to compare one term with the previous term (within the same academic year or the previous one when we compare the first term of a year) and one term with the same term in the previous year. The goal is to obtain a result similar to the following one.

This turns out to be a little bit of a challenge, though Marco and Alberto have the solution for us.

Comments closed

Running Queries across SQL Server Databases

Ed Pollack has a query to run in many places:

A challenge that reappears periodically in the world of databases (especially database management) is the need to run code on a subset of databases and to do so in a nuanced manner. Some maintenance or metrics collection processes can be simply run against every database on a server with no ill-effect, but others may be app-specific, or need to omit specific sets of databases.

This article dives into how to create and customize your own solution, tackling everything from filtering databases to validating schema elements to error-handling.

It does surprise me a bit that there’s no officially supported built-in solution for this. I’ve used sp_foreachdb a lot because it’s readily available, free, and works better than sp_msforeachdb, though there are several available options for this task.

Comments closed

Updates to Change Data Capture in ADF

Chen Hirsh looks at some updates:

A few months ago I wrote a post about the new feature of change data capture (CDC) on Azure data factory (ADF) – https://www.madeiradata.com/post/the-wind-of-change-change-data-capture-in-data-factory

Change data capture, as the name suggests, gets the data changes on one system, and replicates them to another. Since this is a task that data engineers do a lot, this was a very welcome addition to ADF.

In this post, we’ll explore what is new on this front.

Click through for what’s new, though do be cognizant of which items are in GA and which are still in preview.

Comments closed

Finding a Particular Query Plan in Query Store’s UI

Andrea Allred does a search:

I have this problem where I want to see how a newly released query is performing, but it may not be bad enough to make any of the canned reports that SQL Server provides in QueryStore. I was able to look up the plan handle, but always struggled to get to the query id for QueryStore, until now.

Click through for a query to retrieve the query ID and then how to find data on that particular query. I’d also recommend QDSToolbox for more detailed query analysis.

Comments closed

Contrasting Spark and Flink for Streaming Use Cases

Deepthi Mohan and Karthi Thyagarajan contrast two products:

Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink shines in its ability to handle processing of data streams in real-time and low-latency stateful computations. Both support a variety of programming languages, scalable solutions for handling large amounts of data, and a wide range of connectors. Historically, Spark started out as a batch-first framework and Flink began as a streaming-first framework.

In this post, we share a comparative study of streaming patterns that are commonly used to build stream processing applications, how they can be solved using Spark (primarily Spark Structured Streaming) and Flink, and the minor variations in their approach. Examples cover code snippets in Python and SQL for both frameworks across three major themes: data preparation, data processing, and data enrichment. If you are a Spark user looking to solve your stream processing use cases using Flink, this post is for you. We do not intend to cover the choice of technology between Spark and Flink because it’s important to evaluate both frameworks for your specific workload and how the choice fits in your architecture; rather, this post highlights key differences for use cases that both these technologies are commonly considered for.

Read on for an analysis of the two products.

Comments closed

Tips for Limiting Redis Failures

Phil Booth provides the ammo and we provide the feet:

Production outages are great at teaching you how not to cause production outages. I’ve caused plenty and hope that by sharing them publicly, it might help some people bypass part one of the production outage learning syllabus. Previously I discussed ways I’ve broken prod with PostgreSQL and with healthchecks. Now I’ll show you how I’ve done it with Redis too.

For the record, I absolutely love Redis. It works brilliantly if you use it correctly. The gotchas that follow were all occasions when I didn’t use it correctly.

My one addition here is to be really careful if you use Redis as persistent storage rather than a cache. Redis as a cache is easy: if the server goes down or you have trouble, you simply have more database calls than normal. Redis as persistent storage is a much more complicated beast which seems to fall over a lot more often and is significantly more finicky about drivers.

Comments closed