Curated SQL – Page 316 – A Fine Slice Of SQL Server

Comparing Non-Standard Time Periods in Power BI

Published 2023-08-01 by Kevin Feasel

Marco Russo and Alberto Ferrari are back in school:

A requirement to apply the following technique is that every day can belong to a term or not, but there are no overlaps between terms. Indeed, if we had overlapping periods, we should create a different solution based on the Comparing different time periods pattern. In the case of school terms, we consider the case of three terms per year, where the first term starts in September of one year, and the last term ends in July of the following year. Therefore, the academic year is identified by two consecutive numbers, such as 2016-2017 (often shortened to 2016-17).

The business requirement is to compare one term with the previous term (within the same academic year or the previous one when we compare the first term of a year) and one term with the same term in the previous year. The goal is to obtain a result similar to the following one.

This turns out to be a little bit of a challenge, though Marco and Alberto have the solution for us.

Comments closed

Running Queries across SQL Server Databases

Published 2023-08-01 by Kevin Feasel

Ed Pollack has a query to run in many places:

A challenge that reappears periodically in the world of databases (especially database management) is the need to run code on a subset of databases and to do so in a nuanced manner. Some maintenance or metrics collection processes can be simply run against every database on a server with no ill-effect, but others may be app-specific, or need to omit specific sets of databases.

This article dives into how to create and customize your own solution, tackling everything from filtering databases to validating schema elements to error-handling.

It does surprise me a bit that there’s no officially supported built-in solution for this. I’ve used sp_foreachdb a lot because it’s readily available, free, and works better than sp_msforeachdb, though there are several available options for this task.

Comments closed

Updates to Change Data Capture in ADF

Published 2023-08-01 by Kevin Feasel

Chen Hirsh looks at some updates:

A few months ago I wrote a post about the new feature of change data capture (CDC) on Azure data factory (ADF) – https://www.madeiradata.com/post/the-wind-of-change-change-data-capture-in-data-factory

Change data capture, as the name suggests, gets the data changes on one system, and replicates them to another. Since this is a task that data engineers do a lot, this was a very welcome addition to ADF.

In this post, we’ll explore what is new on this front.

Click through for what’s new, though do be cognizant of which items are in GA and which are still in preview.

Comments closed

Finding a Particular Query Plan in Query Store’s UI

Published 2023-08-01 by Kevin Feasel

Andrea Allred does a search:

I have this problem where I want to see how a newly released query is performing, but it may not be bad enough to make any of the canned reports that SQL Server provides in QueryStore. I was able to look up the plan handle, but always struggled to get to the query id for QueryStore, until now.

Click through for a query to retrieve the query ID and then how to find data on that particular query. I’d also recommend QDSToolbox for more detailed query analysis.

Comments closed

Contrasting Spark and Flink for Streaming Use Cases

Published 2023-07-31 by Kevin Feasel

Deepthi Mohan and Karthi Thyagarajan contrast two products:

Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink shines in its ability to handle processing of data streams in real-time and low-latency stateful computations. Both support a variety of programming languages, scalable solutions for handling large amounts of data, and a wide range of connectors. Historically, Spark started out as a batch-first framework and Flink began as a streaming-first framework.

In this post, we share a comparative study of streaming patterns that are commonly used to build stream processing applications, how they can be solved using Spark (primarily Spark Structured Streaming) and Flink, and the minor variations in their approach. Examples cover code snippets in Python and SQL for both frameworks across three major themes: data preparation, data processing, and data enrichment. If you are a Spark user looking to solve your stream processing use cases using Flink, this post is for you. We do not intend to cover the choice of technology between Spark and Flink because it’s important to evaluate both frameworks for your specific workload and how the choice fits in your architecture; rather, this post highlights key differences for use cases that both these technologies are commonly considered for.

Read on for an analysis of the two products.

Comments closed

Tips for Limiting Redis Failures

Published 2023-07-31 by Kevin Feasel

Phil Booth provides the ammo and we provide the feet:

Production outages are great at teaching you how not to cause production outages. I’ve caused plenty and hope that by sharing them publicly, it might help some people bypass part one of the production outage learning syllabus. Previously I discussed ways I’ve broken prod with PostgreSQL and with healthchecks. Now I’ll show you how I’ve done it with Redis too.

For the record, I absolutely love Redis. It works brilliantly if you use it correctly. The gotchas that follow were all occasions when I didn’t use it correctly.

My one addition here is to be really careful if you use Redis as persistent storage rather than a cache. Redis as a cache is easy: if the server goes down or you have trouble, you simply have more database calls than normal. Redis as persistent storage is a much more complicated beast which seems to fall over a lot more often and is significantly more finicky about drivers.

Comments closed

Mitigating Dynamic Data Masking Side-Channel Attacks

Published 2023-07-31 by Kevin Feasel

Ben Johnston wraps up a series on dynamic data masking:

This is the fifth and final part of this series on SQL Server Dynamic Data Masking. The first part in the series was a brief introduction to dynamic data masking, completing solutions, and use cases. The second part covered setting up masking and some examples. The third and fourth sections explored side channel attacks against dynamic data masking.

This final part covers mitigations to side channel attacks, additional architectural considerations and an analysis of the overall solution.

Throughout the entire series, Ben has done a good job of laying out exactly what dynamic data masking is good for—and what it isn’t good for. I tend to harp a lot on the latter but Ben keeps a reasonable approach throughout this series.

Comments closed

Trying Fabric Data Wrangler

Published 2023-07-31 by Kevin Feasel

Reza Rad looks at a new tool:

There is a tool (or you can consider it as an editor) in Fabric for data scientists. As a data scientist, you must work with the data, clean it, group it, aggerate it, and do other data preparation work. This might be needed to understand the data or be part of the process you do to prepare the data and load it into a table for further analysis. Data Wrangler is a tool that gives you such ability. You can use it to transform data and prepare and even generate Python code to make this process part of a bigger data analytics project.

Data Wrangler has a simple-to-use graphical user interface that makes the job of a data scientist easier.

Read on for a video as well as a demo in written format.

Comments closed

MAXDOP by Username in Azure SQL DB

Published 2023-07-31 by Kevin Feasel

Jose Manuel Jurado Diaz comes up with a solution:

Azure SQL Database is a powerful platform that provides managed database services with built-in intelligence and robust resource management. While Azure SQL Database doesn’t have a direct implementation of the traditional Resource Governor feature available in SQL Server, we can explore a pseudo-Resource Governor approach using user-defined functions and custom tables. In this article, we’ll discuss the concept, present a sample implementation using a custom function, and highlight the possibilities it opens up for controlling CPU resources in Azure SQL Database.

Click through for the UDF and how to use it. My first inclination was to say that I couldn’t see it working well at all under load, though on second thought, performance won’t be bad like having a UDF execute for each row in a table, so it’s probably more of a manageable overhead.

Comments closed

Tracking Power BI Import Throughput Variance

Published 2023-07-31 by Kevin Feasel

Chris Webb continues a series on using Log Analytics with Power BI:

In the second post in this series I discussed a KQL query that can be used to analyse Power BI refresh throughput at the partition level. However, if you remember back to the first post in this series, it’s actually possible to get much more detailed information on throughput by looking at the ProgressReportCurrent event, which fires once for every 10000 rows read during partition refresh.

Here’s yet another mammoth KQL query that you can use to analyse the ProgressReportCurrent event data:

Click through for the KQL query, an explanation of how it works, and some practical examples.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts