Press "Enter" to skip to content

Day: September 3, 2025

Making XGBoost Run Faster

Ivan Palomares Carrascosa shares a few tips:

Extreme gradient boosting (XGBoost) is one of the most prominent machine learning techniques used not only for experimentation and analysis but also in deployed predictive solutions in industry. An XGBoost ensemble combines multiple models to address a predictive task like classification, regression, or forecasting. It trains a set of decision trees sequentially, gradually improving the quality of predictions by correcting the errors made by previous trees in the pipeline.

In a recent article, we explored the importance and ways to interpret predictions made by XGBoost models (note we use the term ‘model’ here for simplicity, even though XGBoost is an ensemble of models). This article takes another practical dive into XGBoost, this time by illustrating three strategies to speed up and improve its performance.

Read on for two tips to reduce operational load and one to offload it to faster hardware (when possible).

Leave a Comment

Regaining Access to sa on SQL Server

Garry Bargsley has a method:

Have you ever inherited a SQL Server instance or been called in to troubleshoot, only to discover that no one has SysAdmin access? It happens more often than you’d think. Clients reach out, needing urgent work done, but the SA password is long forgotten, and no other account has elevated permissions.

Unfortunately, SQL Server doesn’t offer a “reset on next login” option for SQL-authenticated accounts. So what can you do?

Read on for one technique. I also covered a similar method from Tim Radney, so you can see the ‘raw’ way (Tim) or the dbatools way (Garry).

Leave a Comment

Comparing Microsoft Fabric Consumption for Notebooks and Warehouse SQL Queries

Gilbert Quevauvilliers performs a comparison:

I saw that there was an update where it is now possible to use the Microsoft Fabric Warehouse to copy data directly from OneLake into the Warehouse.

This got me thinking, which would consume more capacity to get the data into the Warehouse table. As well as which one would be faster.

To do this I am going to be running a SQL query in the Warehouse.

Next, I will use a Notebook to copy the data from the OneLake files section to a Warehouse table.

Gilbert’s specific query involves loading data from a variety of CSV files into a lakehouse via notebook, and then into a warehouse table. Read on for the results.

Leave a Comment

Dataflows Gen2 Tips and Tricks

Jon Vöge provides advice on the least beloved ELT process:

Dataflows Gen2 are frequently (and often rightfully so) bashed for their performance inefficiencies. Especially in comparison with other ingestion and transformation tools in Fabric (Notebooks, Pipelines, Copy Jobs, SPROCs).

The fact remains however, that in the hands of a self-service developer, they are an incredibly powerful tool – if you can spare the compute on your capacity.

In this article, I will highlight tips and tricks to make the most of working with Dataflow Gen2 in Fabric. The list is by no means exhaustive, but simply consists of a bunch of tips which I found useful in the past year, including new and overlooked features, as well as old best practices:

Read on for some things that are new to Dataflows Gen2, working with SharePoint, and making data loads not quite as slow.

Leave a Comment

Statistics on Partitioned Tables in PostgreSQL

Laurenz Albe gathers stats:

I recently helped a customer with a slow query. Eventually, an ANALYZE on a partitioned table was enough to fix the problem. This came as a surprise for the customer, since autovacuum was enabled. So I decided to write an article on how PostgreSQL collects partitioned table statistics and how they affect PostgreSQL’s estimates.

Read on to see how it works and how you can generate statistics at the table level and not just the partition level.

Leave a Comment

The Internals of a Hash Table

Hugo Kornelis digs deep:

In part 1 of this series, I laid the foundation to explore the structure of the hash table, as used by the Hash Match operator, by alleging and then proving that a Hash Match (Left Outer Join) returns unmatched rows from the build input in the order in which they are stored in the hash table. This means that we can create queries on carefully curated data to gain insight in the structure of that hash table.

It is now time to use that trick to actually start to explore the hash table. But not without also looking at available documentation and common sense.

Click through for a waltz down memory lane, a graphical interpretation of a hash table, and some tests to see if Hugo is correct.

Leave a Comment

More Fun with Page Latches

Jared Poche continues a series on page latches:

In my previous blog, I set up a database with two tables, one with a large CHAR(8000) field and one with a smaller VARCHAR(100) field. Both tables use an INT IDENTITY column for their primary key. Since we’ll be inserting rows sequentially, we will see page latch contention when multiple threads attempt to insert.

We ran some initial tests with SQLQueryStress to create some page latch contention and resolved an odd problem causing connection delays.

We’ll use these two tables and test several different approaches to reduce page latch contention.

Jared shows the results for a variety of different tests and even has an embedded Excel spreadsheet, which is how you know he’s done his homework.

Leave a Comment