Curated SQL – Page 335 – A Fine Slice Of SQL Server

Digging into Execution Plans for LAG() and LEAD()

Published 2024-01-08 by Kevin Feasel

Hugo Kornelis looks at a pair of useful window functions:

LAG and LEAD were introduced in SQL Server 2012. They require an OVER clause, but it can only specify PARTITION BY and ORDER BY. No ROWS / RANGE specification for a window frame. Which makes them the stand out as unusual in this series.

By default, they return a value from the last row before the current row, or from the first row after the current row, based on the specified sort order and while observing the specified partition boundaries. But there are two optional parameters, an offset to specify that you want, for instance, the third-last row or the second-next row. And the default parameter specifies a value to be used instead of NULL when the indicated row falls outside of the partition.

Click through to see what the plans look like, as well as how very welcome though potentially performance-impacting changes in SQL Server 2022 have affected this.

Comments closed

Against Publishing Power BI Model Changes from PBI Desktop

Published 2024-01-08 by Kevin Feasel

Soheil Bakhshi has some thoughts:

In a previous post, I shared a comprehensive guide on implementing Incremental Data Refresh in Power BI Desktop. We covered essential concepts such as truncation and load versus incremental load, understanding historical and incremental ranges, and the significant benefits of adopting incremental refresh for large tables. If you missed that post, I highly recommend giving it a read to get a solid foundation on the topic.

Now, let’s dive into Part 2 of this series where we will explore tips and tricks for implementing Incremental Data Refresh in more complex scenarios. This blog follows up on the insights provided in the first part, offering a deeper understanding of how Incremental Data Refresh works in Power BI. Whether you’re a seasoned Power BI user or just getting started, this post will provide valuable information on optimising your data refresh strategies. So, let’s begin.

Read on for plenty of detail, including your available options and how to use them.

Comments closed

Dynamic Parameters in Powershell

Published 2024-01-08 by Kevin Feasel

Laerte Junior explains how dynamic parameters work in Powershell:

Have you ever been in a situation that you want to call a cmdlet or a function with a parameter that depends on a conditional criteria that is available as a list? In this article I will show a technique where you can use PowerShell Dynamic Parameters to assist the user with parameter values.

In the documentation of Dynamic Parameters found at about_Functions_Advanced_Parameters in get-help it is defined as “parameters of a cmdlet, function, or script that are available only under certain conditions.” And can be created so that appears “only when another parameter is used in the function command or when another parameter has a certain value.” So, we can say that PowerShell Dynamic Parameters are used when the result of a parameter depends on the previous parameter.

Click through for examples.

Comments closed

Stored Procedure Wrapup

Published 2024-01-08 by Kevin Feasel

Erik Darling wraps up a series on stored procedures. First, cursors and loops:

You will, for better or worse, run into occasions in your database career that necessitate the use of loops and cursors.

While I do spend a goodly amount of time reworking this sort of code to not use loops and cursors, there are plenty of reasonable uses for them.

I do think we push the “don’t use cursors or loops” thing a little too hard in the SQL Server world, but I also think that a majority of cases in which you’re doing something in a loop, you should be doing it in code outside of SQL Server.

Erik then wraps things up for real:

The general idea of the series was to teach developers about the types of things I always seem to be fixing and adjusting, so that I can hopefully fix really interesting problems in the future.

Of course, that all depends on folks finding these and reading them. If that were the general sway of the world, I’d probably never had been in business in the first place.

Click through for a listing of all of the posts in the series.

Comments closed

LOWESS Smoothing in R

Published 2024-01-05 by Kevin Feasel

Steven Sanderson had me thinking of LOESS but then, bam!, snuck this in on me:

Locally Weighted Scatterplot Smoothing, or Lowess, is a powerful technique for capturing trends in noisy data. It’s particularly useful when dealing with datasets that exhibit complex patterns that might be missed by other methods. So, let’s get our hands dirty and start coding!

Read on for an example of LOWESS smoothing, which actually is a little different from LOESS. If you’re interested in learning more about the differences between LOESS and LOWESS, this Stack Exchange question and answer page is really good.

Comments closed

Using Schema Registry for Data Quality in Apache Kafka

Published 2024-01-05 by Kevin Feasel

Kai Waehner talks data quality:

Good data quality is one of the most critical requirements in decoupled architectures, like microservices or data mesh. Apache Kafka became the de facto standard for these architectures. But Kafka is a dumb broker that only stores byte arrays. The Schema Registry enforces message structures. This blog post looks at enhancements to leverage data contracts for policies and rules to enforce good data quality on field-level and advanced use cases like routing malicious messages to a dead letter queue.

Click through to learn more about the topic. This focuses a lot on the “why” and “what” but does have an example of “how” in there as well.

Comments closed

Incremental Backup in Postgres

Published 2024-01-05 by Kevin Feasel

Robert Haas talks about a new feature:

Five days before Christmas I committed my patch to add incremental backup to PostgreSQL. Actually, I’ve been committing preparatory patches for some months now, but December 20 saw the two main patches land. Since then, there’s been a bunch of bug-fix commits, and there are still a few pending items that need to be addressed, but the core of the feature is now committed. If you want a quick overview of the feature, Lukas Fittl has a great video about that. Here, I’d like to talk about the architecture of the feature itself in a little more detail, and specifically with how we decide which data to copy.

Most people who are likely to read this blog are probably already familiar with the core idea of an incremental backup: instead of copying the whole database instance, just copy the data that has changed. That’s faster, and takes up less space on disk. But, to work properly, you have to be able to quickly and reliably identify which data has, in fact, changed. There’s more than one way to do that.

Read on for some of the complexity around this. It’s interesting to see what goes on behind the scenes in a relational database.

Comments closed

Pagination in Stored Procedures

Published 2024-01-05 by Kevin Feasel

Erik Darling hits on a bugbear of mine:

A common-enough practice to limit search results is to write paginated queries. You may give users the ability to specify how many rows per page they want to see, or you may have a set number of rows per page.

But the end goal is to allow users to very quickly get a smaller number of rows returned to them. Almost no one needs to get many thousands of rows back, unless they’re planning on exporting the data.

Pagination is such a common activity that I wish there were a way to say, “Here is my data. Hang onto it in memory and quickly retrieve a subset of rows upon request” without doing all kinds of shenanigans on my end. Something like a data snapshot that remains in memory as long as the session is active, until the service restarts, until there is memory pressure, or until the caller manually evicts the data. That would make OFFSET and FETCH really useful instead of barely usable for most pagination scenarios because you wouldn’t need to re-run the entire query for every offset/fetch page.

There are ways to make pagination efficient, but the efficient ones aren’t easy or intuitive.

Comments closed

Building a Docker Image from GitHub

Published 2024-01-05 by Kevin Feasel

Andrew Pruski handles an introductory question about containerization:

To build a custom Docker image we create a docker file with instructions in it.

For example, a really simple custom SQL Server 2019 Docker image can be built with:

Click through for the example Dockerfile, followed by instructions on how to build both locally and directly from GitHub.

Comments closed

New Challenge: 2D Interval Packing

Published 2024-01-05 by Kevin Feasel

Itzik Ben-Gan has a new challenge for the new year:

Packing intervals is a classic SQL task that involves packing groups of intersecting intervals to their respective continuous intervals. In mathematics, an interval is the subset of all values of a given type, e.g., integer numbers, between some low value and some high value. In databases, intervals can manifest as date and time intervals representing things like sessions, prescription periods, hospitalization periods, schedules, or numeric intervals representing things like ranges of mile posts on a road, temperature ranges, and so on.

An example for an interval packing task is packing intervals representing sessions for billing purposes. If you conduct a web search of the term packing intervals SQL, you’ll find scores of articles on the subject, including my own.

Read on for more information about this challenge and one solution to it.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts