Press "Enter" to skip to content

Curated SQL Posts

Models and Endpoints in Azure AI Foundry

Tomaz Kastrun continues a series on Azure AI:

Models from the model catalog can be deployed using programming languages or using the Foundry studio.

Model deployment has two types: Deploy from the base model or deploy from the fine-tuned model. The difference is that fine-tuned model is model taken from the model catalog and later tuned to an additional dataset, as the base model is the model as it is available in Azure AI Foundry.

Click through for a bit more information on the process.

Leave a Comment

Switching between Python and PySpark Notebooks in Fabric

Sandeep Pawar wants to save some money:

File this under a test I have been wanting to do for some time. If I am exploring some data in a Fabric notebook using PySpark, can I switch between Python and PySpark engines with minimal code changes in an interactive session? The goal is to use the Python notebook for some exploration or use existing PySpark/SparkSQL or develop the logic in a low compute environment (to save CUs) and scale it in a distributed Spark environment. Understandably, there will be limitations with this approach given the difference in environments, configs etc., but can it be done?

Read on for the answer, as well as plenty of notes around it.

Leave a Comment

Partitioned Tables and Faster Performance

Henrietta Dombrovskaya provides a warning:

I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.

Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).

Henrietta’s example is specifically around Postgres, but it applies to SQL Server all the same: partitioning is primarily a maintenance benefit rather than a performance benefit.

Leave a Comment

The Benefits of Read Committed Snapshot Isolation

Haripriya Naidu gets optimistic:

In this article, we are going to take a look at how write operations block read operations under default isolation level – Read Committed and how to avoid this using the optimistic isolation level – Read Committed Snapshot.

I do wish that this were the default for new databases in SQL Server, like it is in Azure SQL Database.

H/T Brent Ozar’s newsletter for pointing me in Haripriya’s direction.

Leave a Comment

Calling IFTTT from Powershell

Denny Cherry triggers an action:

Sometimes, you want/need to have PowerShell do an action, but it’s hard to do through PowerShell, or it can’t be done at all. If the website If This Then That (IFTTT) can do it instead, you can still do it through PowerShell, with a little extra work. The way to do this will be to call a webhook on an IFTTT Applet and pass the information you need to send to the third-party service. A great example of this would be if you need to automate posting a new post to FaceBook. Posting to Facebook is shockingly hard as you have to write all the pieces to post to Facebook manually, and there’s no current way to post to Facebook via PowerShell. However, you can call an IFTTT Action and have that action do the actual posting to Facebook for you.

Click through for a generic form of the process.

Leave a Comment

Using complete.cases in R

Steven Sanderson has no time for missing data:

Data analysis in R often involves dealing with missing values, which can significantly impact the quality of your results. The complete.cases function in R is an essential tool for handling missing data effectively. This comprehensive guide will walk you through everything you need to know about using complete.cases in R, from basic concepts to advanced applications.

Using complete.cases to find observations with missing values is great. Using it to eliminate observations with missing values can sometimes be helpful, depending on just how many missing values you have.

Leave a Comment

Shrinking ONNX Files

Pete Warden breaks out the shrink ray:

I’ve been using the ONNX Runtime a lot recently, and while it has been a lot of fun, there are a few things I’ve missed from the TensorFlow Lite world. The biggest (no pun intended) is the lack of tools to shrink the model file size, something that’s always been essential in the mobile app world. You can quantize using the standard ONNX tools, but in my experience you’ll often run into accuracy problems because all of the calculations are done at lower precision. These are usually fixable, but require some time and effort.

Read on for Pete’s preferred alternative and a new tool to help with this.

Leave a Comment

Scanning Fabric Workspaces via Semantic Link Labs

Sandeep Pawar takes us through the Scanner API:

It’s finally here! Thanks to Michael Kovalsky, one of the most requested & anticipated APIs in now available in Semantic Link Labs (v0.8.10) – the Scanner API. The Scanner API in Fabric Admin REST APIs allows Fabric administrators to retrieve detailed metadata about their organization’s Fabric items, supporting governance and compliance efforts. It provides information such as item names, descriptions, date created, lineage, connection strings etc. It’s not new, we have been using it in Power BI for a long time but in the Fabric world, it’s even more important given the number of items and configurations.

Read on to see what’s available and how this works.

Leave a Comment