Press "Enter" to skip to content

Month: April 2025

Fine-Tuning a DistilBERT Model for Question Answering

Muhammad Asad Iqbal Khan builds upon a simple model:

The transformers library provides a clean and well-documented interface for many popular transformer models. Not only it makes the source code easier to read and understand, it also provided a standardize way to interact with the model. You have seen in the previous post how to use a model such as DistilBERT for natural language processing tasks. In this post, you will learn how to fine-tune the model for your own purpose. This expands the use of the model from inference to training. Specifically, you will learn:

  • How to prepare the dataset for training
  • How to train a model using a helper library

DistilBERT is a major simplification of BERT, but it comes with the advantage that it’s very easy to train on modest hardware and performance is in the same realm of acceptability as the full BERT model. Switching from DistilBERT to BERT isn’t as easy as just swapping out model classes, though it’s pretty close.

Leave a Comment

The Power of Virtual Environments in Python

I have a new video:

In this video, I explain why virtual environments are such an important concept in Python and why you should generally be using them. I also talk about virtual environments versus Docker containers and how these are not mutually exclusive.

It took me a while to understand why virtual environments make sense, and I think part of the difficulty in adapting to this mental model was that I was used to the .NET mechanism for package management: per-project library installation. Sure, there was the Global Assembly Cache (GAC) in .NET Framework and that had similar problems to installing packages in base Python installations, but we didn’t use it that often. Or at least, I’ve sublimated however many hours of pain I fought the GAC to the point that I don’t remember them anymore.

Leave a Comment

The New Fabric CLI

Hasan Abo Shally announces a CLI:

  • The Fabric CLI is now in preview
  • It offers a developer-first, file-system-inspired way to explore and manage Microsoft Fabric
  • Use it interactively or script it into your workflows — from your terminal, in seconds
  • Built on Fabric APIs, designed for automation, and constantly evolving
  • Open source is on the horizon — with plans to empower the community to extend and shape the CLI

Give it a try. Break things. Tell us what you want next.

Click through for the full announcement. The idea here is to be the az cli for Fabric. Between this and Semantic Link Labs, it will make automating tasks in Microsoft Fabric easier.

Leave a Comment

A New Dashboard for Distributed Availability Groups

David Fowler has been busy:

This comes off of the back of my last post looking at using a distributed availability group (DAG) to help facilitate a SQL server migration. SQL Server Migration Using a Distributed Availability Group

One thing that I mentioned in that post was that, although SSMS gives us a nice dashboard to check the health of our regular AGs. There’s nothing there to look at the state that the DAGs are in. The only choice that we’ve got is to tap up and compare results from a couple of DMVs on each side.

David has met that demand. Read on to see what the solution includes and how you can get your hands on it.

Leave a Comment

Calling a Microsoft Fabric REST API via Azure Data Factory

Koen Verbeeck makes the call:

Suppose you want to call a certain Microsoft Fabric REST API endpoint from Azure Data Factory (or Synapse Pipelines). This can be done using a Web Activity, and most Fabric APIs now support service principals or managed identities. Let’s illustrate with an example. I’m going to call the REST API endpoint to create a new lakehouse. 

Click through for the instructions.

Leave a Comment

Deploying and Using Custom Python Libraries in Microsoft Fabric

Miles Cole picks up from part one:

This is part 2 of my prior post that continues where I left off. I previously showed how you can use Resource folders in either the Notebook or Environment in Microsoft Fabric to do some pretty agile development of Python modules/libraries.

Now, how exactly can you package up your code to distribute and leverage it across multiple Workspaces or Environment items? How could we acomplish something like the below?

Read on for the answer.

Leave a Comment

Working around Errors Migrating to Azure SQL Managed Instance

Ben Johnston has an after-action report:

I was recently on a project to migrate a very transactional installation of SQL Server to Azure SQL Managed Instance (MI). SQL Managed Instance is a good stepping stone between a full, on-prem SQL instance / Azure VM and an Azure SQL Database. It has most of the functionality of a full, on-prem instance, with management of the SQL engine, backups, OS and underlying hardware done by Microsoft. It allows you to use cross database queries and run SQL Agent jobs, with fewer limitations than Azure SQL Database migrations.

The migration process isn’t completely seamless. During the migration of this system, we encountered several surprises. Hopefully, this will help you avoid, or at least be prepared for these differences from the on-prem version. This also reinforces the importance of testing each aspect of your migration.

This is part one of a two-parter and focuses on issues during the deployment process. Ben promises a follow-up with post-deployment issues you could run into. I expect that’s where the “What is this performance?” issues will come into play.

Leave a Comment

Dealing with Arrays in SQL and jOOQ

Lukas Eder covers mapping functions:

ARRAY types are a part of the ISO/IEC 9075 SQL standard. The standard specifies how to:

  • Construct arrays
  • Nest data into arrays (e.g. by means of aggregation or subqueries)
  • Unnest data from arrays into tables

But it is very unopinionated when it comes to function support. The ISO/IEC 9075-2:2023(E) 6.47 <array value expression> specifies concatenation of arrays, whereas the 6.48 <array value function> section lists a not extremely useful TRIM_ARRAY function, exclusively (using which you can remove the last N elements of an array, something I have yet to encounter a use-case for)

There are a few database platforms that support the ARRAY type, as Lukas lays out.

Leave a Comment

The Overhead Cost of Kubernetes

Steve Jones shares some thoughts:

A report of cloud Kubernetes usage shows that these resources are being under-utiliized, over-provisioned, and costing more than necessary for many organizations. From the previous year, average CPU declined from 13% to 10%, and memory is used at only around 23%. Companies are over-provisioning their clusters, which is understandable. No one wants to have systems overloaded and users complaining about performance.

Steve goes on to list some of the challenges of running an orchestrator like Kubernetes (or OpenShift or whatever). There’s a lot of code and process behind them, and that can be challenging if you don’t have administrators who know what they’re doing. Even hosting in Azure Kubernetes Service or Amazon Elastic Kubernetes Service only removes some of the systems management pain. That said, there is a certain level of comfort in knowing that my applications will automatically restart if a problem occurs, so the pain is usually worth it.

Leave a Comment

Avoid using sysadmin Accounts for Linked Servers

Denny Cherry shares sound advice:

When setting up linked servers, the selection of the accounts that are used for the linked server logins should have the lowest permissions needed to get what the users on the source side of the linked server need to do. Over time, this will mean changing the permissions of the linked server or even setting up multiple linked servers that all point to the same target server so that different applications don’t have permission to access each other’s databases over the linked server. The one thing that you never want to do is to use a login for the linked server that has sysadmin rights on the target instance, especially if that linked server is available for everyone on the server to use.

Click through to understand why.

Leave a Comment