Curated SQL – Page 242 – A Fine Slice Of SQL Server

Fabric F2 Performance

Published 2023-12-26 by Kevin Feasel

Teo Lachev has started a new series. We begin with warehouse ETL:

As inspired by Amir Netz‘s encouragement to partners to test the Fabric F2 capacity performance, I got on a quest to test what it would do to ETL loads for Fabric Warehouse. I must admit that I was skeptical that a quarter of a core would take a warehouse off the ground, but as usual, life proved me wrong and “wrong” is a big understatement of what happened.

After provisioning a Fabric F2 capacity and a warehouse, I settled on the Retail Data Model for World Wide Importers sample star schema dataset consisting of five dimension tables and one fact table. In terms of performance, I was mostly interested in how long it would take for the ADF copy activity to insert all the data (50 million rows) in the fact table. Granted, it’s a limited test but enough to rule out the technology for real-life projects. Then, I compared the performance against Azure SQL Database Serverless running on up to 2 cores and provisioned by the free trial offer that Microsoft has on Azure. To exclude impact on data transfer between regions, both technologies were provisioned on East US 2 data region, which is the region where my Power BI tenant is hosted on.

Then we have report load time:

What a better way to spend a lazy holiday afternoon than to do more Fabric performance testing? In my previous post, I shared my results from a single-threaded ETL load test to gauge the F2 ingest performance and F2 did pretty well (or at least outperformed Azure SQL DB). Will F2 hold as parallelism increases? Throughput testing is especially important for report loads because parallel tasks can run within a report, such as visuals executing DAX queries in parallel, and across reports, such as when concurrent report requests overlap.

I’m legitimately surprised at the results. I expected F2 to be barely sufficient for testing purposes. Read both posts to see how it performs and some caveats around performance.

Comments closed

Building Nested Data Types in Excel

Published 2023-12-26 by Kevin Feasel

Chris Webb shows off a feature:

A year ago support for nested data types in Excel was announced on the Excel blog, but the announcement didn’t have much detail about what nested data types are and the docs are quite vague too. I was recently asked how to create a nested data type and while it turns out to be quite easy, I thought it would be good to write a short post showing how to do it.

I came into this prepared to cringe, but it’s actually pretty cool insamuch as you’re using Excel as a UI for business work. And, let’s be honest, Excel is still the most common UI for business work out there.

Comments closed

Conditional Logic in Stored Procedures

Published 2023-12-26 by Kevin Feasel

Erik Darling has a warning:

There are two forms of conditional logic that I often have to fix in stored procedures:

Branching to run different queries at different times

Complicated join and where clause logic

The problems with both are similar in terms of performance. You see, when smart people tell you that SQL is a declarative language, and not a procedural language, they’re usually trying to get you to stop using cursors.

And that’s not always wrong or bad advice, trust me. But it also applies here.

Read on for exceptions to the rule and how you can make your life a bit easier if you do have this in place.

Comments closed

TRY: CAST, CONVERT, and PARSE

Published 2023-12-26 by Kevin Feasel

Andy Brownsword tries and tries and tries again:

In the previous post we looked at how ISNUMERIC and TRY_CAST work and why we may want to utilise the latter when building validation for our data. When SQL Server 2012 rolled around it wasn’t only TRY_CAST which was added, we also had TRY_CONVERT and TRY_PARSE introduced too.

Here we’re going to look at how those function and the differences in the outputs which they can provide. We’ll start with the sample data below as the basis for these demonstrations:

Andy focuses mostly on the behavioral differences, where I like TRY_PARSE() a lot, especially because you can control locale for dates and times. When it comes to performance, I’ve found TRY_CAST() and TRY_CONVERT() to be essentially the same performance-wise. Maybe there’s a tiny difference between the two but there’s no guarantee of it.

TRY_PARSE(), however, calls into the .NET CLI for each execution and will be considerably slower. If you’re parsing a few or a few dozen values, you probably won’t notice the difference. If you’re parsing tens of thousands of values (or more), I assure you that you’ll notice the difference.

Comments closed

Looping through SQL Server Instances in Powershell

Published 2023-12-26 by Kevin Feasel

Ajay Dwivedi has a script for us:

As a database administrator, often I have to fetch some metadata from all the SQLServers that we have. Other times, I have to execute some DDL or DML on all the servers.

In this blog and shared video, I show how to write a multiple server PowerShell script where server list source could range from raw text files to some inventory-based query result.

This is a serial operation, so you’re hitting one instance at a time. I’ve noticed that Powershell has about a half-dozen ways of performing parallel actions but they all seem to come with at least one fatal flaw.

3 Comments

The Data Streaming Landscape in 2024

Published 2023-12-22 by Kevin Feasel

Kai Waehner gives us an overview of where data streaming technologies are at:

The research company Forrester defines data streaming platforms as a new software category in a new Forrester Wave. Apache Kafka is the de facto standard used by over 100,000 organizations. Plenty of vendors offer Kafka platforms and cloud services. Many complementary open source stream processing frameworks like Apache Flink and related cloud offerings emerged. And competitive technologies like Pulsar, Redpanda, or WarpStream try to get market share leveraging the Kafka protocol. This blog post explores the data streaming landscape of 2024 to summarize existing solutions and market trends. The end of the article gives an outlook to potential new entrants in 2025.

Kai is Kafka-centric but this is a good overview of the industry and worth taking the time to read.

Comments closed

Time Travel in Delta Tables

Published 2023-12-22 by Kevin Feasel

Manish Mishra shows off some of the query capabilities with delta tables:

Delta Time Travel is a feature that is provided by Delta Lake. Delta time travel allows the user to switch to the previous version of the delta table.

Some of the benefits of Delta Time Travel are:

Historical Data Analysis

Rollback to the previous version in case of new data quality is not valid

Supports Schema Evolution

Click through for examples of each of these.

Comments closed

Notebook Concurrency in Microsoft Fabric

Published 2023-12-22 by Kevin Feasel

Ed Oldham takes us through a common problem:

If you are currently using Microsoft Fabric you will have some sort of capacity associated with your account. This will have a large impact on what you can run concurrently. If you are on a Fabric Trial, you will have access to a trial capacity and if you are paying you will be on a certain capacity tier based on how much you pay. The following diagram shows information about each level of capacity and the Trial. The Trial resembles F64 capacity but is apparently different in some important ways (More on that later).

Read on to learn more about capacity and what that means for concurrent notebooks and Spark jobs.

Comments closed

Table Results for DBCC PAGE

Published 2023-12-22 by Kevin Feasel

Andy Yun is pleased:

Am playing around with Always Encrypted for the first time. I was just following along the basic tutorial and encrypted some columns in my AutoDealershipDemo database. But then I decided to go crack open the data page using my friend DBCC PAGE.

Read on to see how you can get the results of DBCC PAGE into a table. My recollection is that there are some limits to what it can write into the table, but it’s pretty good on the whole.

Comments closed

Load Balancing across Azure SQL DBs

Published 2023-12-22 by Kevin Feasel

Jose Manuel Jurado Diaz scales out:

In today’s data-driven landscape, we are presented with numerous alternatives like Elastic Queries, Data Sync, Geo-Replication, ReadScale, etc., for distributing data across multiple databases. However, in this approach, I’d like to explore a slightly different path: creating two separate databases containing data from the years 2021 and 2022, respectively, and querying them simultaneously to fetch results. This method introduces a unique perspective in data distribution — partitioning by database, which could potentially lead to more efficient resource utilization and enhanced performance for each database. While partitioning within a single database is a common practice, this idea ventures into partitioning across databases.

Click through to see what the code looks like for this.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts