Curated SQL – Page 169 – A Fine Slice Of SQL Server

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Published 2024-06-03 by Kevin Feasel

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Invoking a Fabric Data Factory Pipeline via REST API

Published 2024-06-03 by Kevin Feasel

Andy Leonard makes a call:

This post is current as of 30 May 2024. There are other posts by fantastic bloggers about how to use the Fabric REST API. Fabric development is progressing so fast, some of those posts are less up-to-date. Make no mistake, this post will most likely not age well, and for the very same reason. That’s ok. We bloggers live to serve. I, like all the rest, will endeavor to persevere – and we will all write more posts, Lord willing.

In this post, I share one way to invoke Fabric Data Factory pipelines using the REST API.
I will be using the web version of Postman to call REST API methods.
You can sign up for a free Postman account. Since it’s free, I encourage you to check the box to receive news and offers from them. As I mentioned in an earlier post, you can always unsubscribe if the messages are unhelpful or if they get too “chatty.”

Read on for that way.

Comments closed

Using PostGIS in the Terminal

Published 2024-06-03 by Kevin Feasel

Dian M. Fay talks turkey about terminals:

Of late, I’ve been falling down a bunch of geospatial rabbit holes. One thing has remained true in each of them: it’s really hard to debug what you can’t see.

There are ways to visualize these. Some more-integrated SQL development environments like pgAdmin recognize and plot columns of geometry type. There’s also the option of standing up a webserver to render out raster and/or vector tiles with something like Leaflet. Unfortunately, I don’t love either solution. I like psql, vim, and the shell, and I don’t want to do some query testing here and copy others into and out of pgAdmin over and over; I’m actually using Leaflet and vector tiles already, but restarting the whole server just to start debugging a modified query is a bit much in feedback loop time.

Read on for Dian’s recommendations.

Comments closed

Impossible Execution Plan Timings

Published 2024-06-03 by Kevin Feasel

Paul White puts up an article:

I showed a hidden option to make all operators report only their individual times in

More Consistent Execution Plan Timings in SQL Server 2022

. That feature isn’t complete yet, so the results aren’t perfect, and it’s not documented or supported.

I mention all that in case you are interested in the background. None of the foregoing explains what we see in this mixed mode plan. The row mode Gather Streams elapsed time ought to include its children. The batch mode Sort should just be reporting its own elapsed time.

With that understanding in mind, there’s no way the Sort could run for longer than the Gather Streams. What’s going on here?

Read on for a Paul White-level discussion of the topic, including a demo from Erik Darling.

Comments closed

Building a 100% Stacked Bar Chart in R

Published 2024-05-31 by Kevin Feasel

Tomaz Kastrun builds a visual:

Calculating cumulative percentage or percentage per group for each time can sometimes be a task with a slight twist. Let’s check this with ggplot2 and tidyverse.

Click through for three separate ways of doing this.

Comments closed

Data Encryption Options in Postgres

Published 2024-05-31 by Kevin Feasel

Greg Nokes gives us the options:

Operating system or disk-level encryption protects entire file systems or disks. This method is application-agnostic and offers encryption with minimal overhead. Think technologies like luks in Linux or FileVault in MacOS.

Read on for four options. They’re very similar to options available in SQL Server, so it’s easy enough to compare implementation ideas.

Comments closed

Uniqueidentifier Ordering in SQL Server

Published 2024-05-31 by Kevin Feasel

Jose Manuel Jurado Diaz clears up the mystery:

Today, I worked on a service request that our customer asked about how SQL Server sorts the uniqueidentifier data type. We know that uniqueidentifier store globally unique identifiers (GUIDs). GUIDs are widely used for unique keys due to their extremely low probability of duplication. One common method to generate a GUID in SQL Server is by using the NEWID() function. However, the ordering of GUIDs, especially those generated by NEWID(), can appear non-intuitive. I would like to share my lessons learned how to determine the ordering method using uniqueidentifier and NEWID().

Yeah, things get pretty weird because x86-x64 processors (Intel and AMD both) are Little Endian, and only the fourth chunk of a GUID is Big Endian.

Comments closed

Finding Duplicate Post Titles and Tuning the Query

Published 2024-05-31 by Kevin Feasel

Erik Darling makes a friend:

I’m going to be totally open and honest with you, dear reader: I’ve been experimenting with… AI.

See, I’m just a lonely independent consultant, and sometimes it’s just nice to have someone to talk to. It’s also kind of fun to take a query idea you have, and ask “someone” else to write it to see what they’d come up with.

ChatGPT (for reference, 4 and 4o) does a rather okay job sometimes. In fact, when I ask it to write a query, it usually comes up with a query that looks a lot like the ones that I have to fix when I’m working with clients.

Considering that the clients probably stole the query idea from Stack Overflow as well, that makes sense. But there was a clever trick that the query returned, so check it out.

Comments closed

Unhelpful Error Restoring Azure SQL MI Database

Published 2024-05-31 by Kevin Feasel

Kendra Little encounters an error:

What’s it like to be a Database Administrator for managed databases in Azure? Sometimes it’s a painful guessing game when a routine, core operation– restoring a database – fails with a most unhelpful error.

In this case, if the restore is run via PowerShell, following Microsoft guidance, the error message is:

Restore-AzSqlInstanceDatabase: Long running operation failed with status ‘Failed’. Additional Info: An unexpected error occured while processing the request. [sic]

Somehow the misspelling of ‘occurred’ stings a bit more. Did anyone review the PR for this code?

I’m trying to weigh in my mind whether this error is worse than “String or binary data would be truncated.” One the one hand, the spelling is correct in the latter error message. On the other hand, it uses passive voice. On the gripping hand, they’re both nigh-useless error messages. Hopefully the SQL MI team fixes Kendra’s error message at least as well as the database engine fixed the latter.

Comments closed

Updates to the TidyDensity R Package

Published 2024-05-30 by Kevin Feasel

Steven Sanderson has a round of updates to share:

util_negative_binomial_aic(): Calculate the Akaike Information Criterion (AIC) for the negative binomial distribution. This function aids in model selection, helping you determine the best-fitting model for your data.

Read on for utility functions for a dozen or so distributions, as well as a few other small fixes.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts