Curated SQL – Page 462 – A Fine Slice Of SQL Server

We received some support cases when customers encounter the error below while trying to add an AAD security group to their Azure SQL Database or Azure SQL managed instance. In this blog article, we will be listing a few points to be checked to troubleshoot this error and can help you to identify the cause.
Msg 33130, Level 16, State 1, Line 1
Principal ‘XXXXXX’ could not be found or this principal type is not supported.

Read on for several reasons why things might be going wrong.

Comments closed

Connecting to Azure Blob Storage with SQL Server 2022

Published 2022-09-06 by Kevin Feasel

I take a look back at the past and forward to the future:

PolyBase used to connect to Azure Blob Storage. Specifically, you could use the wasbs:// protocol and connect to Azure Blob Storage over WebHDFS. Here’s an example of an external data source which would work for SQL Server 2016 through 2019:

SQL Server 2022 changes its mechanisms around Azure Blob Storage a little bit, though I think the changes are sensible.

Comments closed

Cell-Level (aka Column-Level) Encryption in SQL Server

Published 2022-09-06 by Kevin Feasel

Etienne Lopes takes us through cell-level encryption:

Securing data has always been important but as time goes by, more and more data is available all around us, some of it is considered sensitive data and it becomes a major concern to protect it somehow, in fact in certain cases it is legally mandatory to comply with certain regulations (like GDPR). SQL Server offers a few options regarding data protection (either by means of encryption or obfuscation), cell level encryption being one of them.
In this post I’ll explain what is cell level encryption along with its use cases and I’ll use a thorough demo to show how to implement it in a database and how it works

Despite Always Encrypted being available, I’m more apt to use column-level/cell-level encryption because it’s easier to implement. Well, that and the lack of Always Encrypted in Azure Synapse Analytics.

Comments closed

Finding SQL Server Indexes with Unused INCLUDE Columns

Published 2022-09-06 by Kevin Feasel

Dave Mason does some digging:

Periodic index analysis for SQL Server typically involves tasks such as checking for missing/unused/overlapping indexes, checking for heaps that maybe should have been designed with a clustered index, analyzing ROW/PAGE compression, etc. There are numerous DMVs that you can use as a starting point for those tasks. There’s also some good open source tools and scripts that members of the SQL Server community have created and shared. One task that I don’t recall ever seeing was an analysis of INCLUDE columns for nonclustered indexes. What I really wanted to do was to find INCLUDE columns that were never being used, and remove them from index definitions.

Click through to see what Dave learned, as well as the repo containing several useful scripts.

Comments closed

Getting a Power BI Dataset’s Refresh History

Published 2022-09-06 by Kevin Feasel

Chris Webb wants to know the lore:

In the last post in this series I showed how to get the status of an individual Power BI dataset refresh. In this post I’ll show how to get the history of a dataset’s refreshes and save it to a file in OneDrive for Business – useful if you want to do some analysis on refresh performance or the causes of failures.
This is one of the most straightforward things to do with the Power BI Enhanced Refresh API , especially once you’ve already got a custom connector created in the way I’ve shown in the previous posts in this series. You just need to add a new Action to your custom connector, fill in the General section:

Click through to see what to do.

Comments closed

Power BI: Merge Joins and Nullable Columns

Published 2022-09-06 by Kevin Feasel

Chris Bailiss dives into some join problems in Power BI against Snowflake and SQL Server:

Power BI generates SQL that performs very poorly against Snowflake (and possibly other database platforms) when joining nullable columns, e.g. in the “Merge Queries” action in the Power BI query editor.
Specifically, the SQL generated prevents the use of the “hash join” physical join algorithm that has been one of the main algorithms used for decades by database engines when executing joins between large tables. This forces database engines to fall-back to using very inefficient physical joins (e.g. loop/cartesian join) that can perform orders of magnitude slower.
The reason why Power BI generates different SQL for non-nullable for nullable columns in a join is explained in the “Problem Part 2” section below.

Read the whole thing—and there’s a lot of good information in there.

Comments closed

Recursive Flows in Power Automate

Published 2022-09-06 by Kevin Feasel

Imke Feldmann starts a new series:

If you want to traverse organisational hierarchies, walk through nested folder structures or create nested Power BI metrics from an Excel table for example, you need some sort of recursive logic. In this article I will introduce you to the basic method of recursive flows in Power Automate. In upcoming articles I will share some of the flows for the above mentioned use cases.

Read on for an intuitive understanding of what recursion is as well as how you can implement it in a Power Automate flow.

Comments closed

Happy Labor Day

Published 2022-09-05 by Kevin Feasel

Curated SQL is taking the day off for Labor Day. If you’re in the US, hopefully you enjoy the holiday. If you’re not in the US, you’re still welcome to enjoy the holiday.

Comments closed

RStudio Connect and Python’s FastAPI

Published 2022-09-02 by Kevin Feasel

Parisa Gregg continues a series on RStudio Connect and Python:

FastAPI is a light web framework and as you can probably tell by the name, it’s fast. It provides a similar functionality to Flask in that it allows the building of web applications and APIs, however it is newer and uses the ASGI (Asynchronous Server Gateway Interface) framework. One of the nice features of FastAPI is it is built on OpenAPI and JSON Schema standards which means it has the ability to provide automatic interactive API documentation with SwaggerUI. You also get validation for most Python data types with Pydantic. FastAPI is therefore another popular choice for data scientists when creating APIs to interact with and visualize data.
In this blog post we will go through how to deploy a simple machine learning API to RStudio Connect.

I’ve taken pretty well to FastAPI for rapid API development. I haven’t had to worry about scaling it out too much, so I’m not sure how well that works in practice. Still, for single-user or few-user apps, FastAPI definitely works well.

Comments closed

Replacing ZooKeeper with Kafka Raft

Published 2022-09-02 by Kevin Feasel

Dave Shook shows off the Kafka Raft protocol:

Apache Kafka^® Raft (KRaft) is the consensus protocol that was introduced to remove Apache Kafka’s dependency on ZooKeeper™ for metadata management. This greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself, rather than splitting it between two different systems: ZooKeeper and Kafka. KRaft mode is available in the Apache Kafka 3.1 release, though it is not yet ready for use in production environments. Refer to KIP-833 to learn more about when KRaft will be marked as production ready.
Below are several key resources to help you learn everything you need to know about the ins and outs of KRaft, its role in the Kafka architecture, and how you can get started in trying it out. These resources are followed by two others that let you “do stuff” with Kraft, i.e., run a KRaft mode cluster and observe how it works when various cluster controller related operations take place.

Eliminating the ZooKeeper dependency was a big goal for the Kafka team for several years. It’s not quite out yet but I’ll be interested to see how that migration works for companies.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts