Curated SQL – Page 251 – A Fine Slice Of SQL Server

Vacuum, MERGE, and ON CONFLICT Directives

Published 2024-09-06 by Kevin Feasel

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive

Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge.

Read on to see how things look now.

Comments closed

Querying a Fabric SQL Endpoint via Notebook and T-SQL

Published 2024-09-06 by Kevin Feasel

Sandeep Pawar talks about a Spark connector:

I am not sharing anything new. The spark data warehouse connector has been available for a couple months now. It had some bugs, but it seems to be stable now. This connector allows you to query the lakehouse or warehouse endpoint in the Fabric notebook using spark. You can read the documentation for details but below is a quick pattern that you may find handy.

Despite it not being anything new, it is still interesting to see the use case of writing T-SQL instead of Spark SQL.

Comments closed

Data Type Changes in Snowflake

Published 2024-09-06 by Kevin Feasel

Kevin Wilkie makes some changes:

When working with data, I usually have an idea of what type of data I will push into a field. Sometimes, for whatever reason, it is decided to change the type of data allowed in the field. Today, I want to show how that’s done in Snowflake.

Click through to learn how, and how it’s not quite the same as SQL Server.

Comments closed

Exploring Azure Data Storage Options

Published 2024-09-06 by Kevin Feasel

Anurag K talks about data storage:

Choosing the right data storage option in Azure is critical for ensuring your applications run efficiently, securely, and cost-effectively. In this blog post, we’ll dive deeper into three key Azure storage services: Blob Storage, File Storage, and Disk Storage. We’ll explore their features, use cases, and provide specific examples to help you understand how to best leverage these services in your cloud environment.

I would note here that within the Blob storage section, you could carve out an explanation of Data Lake Storage Gen2, which starts from Blob storage and then adds hierarchical namespaces (i.e., folders).

Comments closed

Testing Python Code with pytest

Published 2024-09-05 by Kevin Feasel

Aida Gjoka builds some tests:

Testing code using automated tools is common throughout the software development industry. This technique can improve the quality of the code you write as a data scientist. Testing helps refine your code, supports redesign, prevents errors, and makes it harder to write single-use code.

Here, we introduce the pytest framework and show how it can be used to test Python functions. If you don’t use a testing framework as part of your daily workflow, try experimenting with the techniques presented here the next time you write or extend a function.

I am a big fan of pytest because it strikes what I consider to be a great balance between convention and customization. There’s very little administrative overhead to creating test classes and test cases, so tests are easy to build and can it’s trivial to run a test suite or a specific part of one.

Comments closed

Search in a Vector Database

Published 2024-09-05 by Kevin Feasel

Brendan Tierney continues a series on vector databases:

Searching semantic similarity in a data set is now equivalent to searching for nearest neighbors in a vector space instead of using traditional keyword searches using query predicates. The distance between “dog” and “wolf” in this vector space is shorter than the distance between “dog” and “kitten”. A “dog” is more similar to a “wolf” than it is to a “kitten”.

Click through to learn more about some of the common techniques for performing similarity searches against vectorized data.

Comments closed

Index Rebuilds versus Reorganizations

Published 2024-09-05 by Kevin Feasel

Sergey Gigoyan opens a can of worms:

When data is modified in a Microsoft SQL database, the corresponding indexes are also modified. These modifications cause indexes to become fragmented. Fragmentation means that the logical order of the data inside the index pages does not match its physical order. The more the fragmentation level is increased, the more performance is affected for SELECT statements. With fragmented indexes, the information is not spread logically, making the index’s data retrieving operations more time-consuming resulting in query performance issues. Thus, fragmentation of the indexes should be fixed periodically in order to maintain high performance. Rebuild and Reorganize index operations are aimed at defragmenting indexes.

Click through to learn a bit about why we may need index maintenance, followed by a comparison between index rebuilding versus index reorganization. All that said, I do not have room in my indexing philosophy (or my heart) for reorganization. Reorganization simply doesn’t provide enough value as it is. If you want a really detailed dive into the topic, Jeff Moden put together an amazing talk that we were lucky enough to be able to record. Specifically, about an hour and a quarter in (link to timestamp), Jeff gets to the topic of reorganization and how little it actually does in practice.

Comments closed

Running a Microsoft Fabric Notebook via Azure DevOps

Published 2024-09-05 by Kevin Feasel

Kevin Chant runs a notebook:

In this post I want to share one way that you can run a Microsoft Fabric notebook from Azure DevOps.

You can consider this post a follow-up to my last post about unit tests on Microsoft Fabric items. Since somebody asked me about automating notebooks and I wanted to show it in action.

Please note, currently the ability to call the API that runs a notebook on demand does not support service principals.

Despite that limitation, Kevin shows two ways to authenticate while calling the appropriate API.

Comments closed

Granting Entra ID Guest Users Access to a Semantic Model via Excel

Published 2024-09-05 by Kevin Feasel

Gilbert Quevauvilliers notes an implication:

Yes, you read that right, it is possible where you have shared Power BI semantic model with an external (guest user in Entra ID) they can now connect to the semantic model using Excel.

This has been around for quite a while I just did not know about it, here is the Microsoft documentation: Semantic model connectivity and management with the XMLA endpoint in Power BI – Power BI | Microsoft Learn

The first thing to note is that this currently only works in the Excel Desktop App.

I did try using Excel on the Web and unfortunately that did not work.

I then created my Excel file, uploaded to SharePoint Online and tried to interact with the Excel Pivot table and that too did not work.

Ok enough of the limitations let me show you how to get it working.

Click through to see how. Gilbert also has an important note about row-level security, co read the whole thing.

Comments closed

T-SQL Snapshot Point-in-Time Recovery to Azure VM

Published 2024-09-05 by Kevin Feasel

Anthony Nocentino continues a series on T-SQL snapshot backups:

In this post, the third in our series on using T-SQL Snapshot Backup, I will guide you through using the new T-SQL Snapshot Backup feature in SQL Server 2022 to take a snapshot backup and then perform point-in-time database restores using that snapshot backup as the base, but this time using an Azure Virtual Machine. We will explore how to manage Azure storage-level operations, such as taking snapshots, cloning snapshots, and executing an instantaneous point-in-time database restore from the snapshot with minimal impact on your infrastructure. Additionally, I will demonstrate a PowerShell script that utilizes dbatools and Azure Az modules to automate the process.

Read on for the script and plenty of details.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Curated SQL Posts