Press "Enter" to skip to content

Author: Kevin Feasel

Kernel Methods in Python

Matthew Mayo does a bit of kernel work:

Kernel methods are a powerful class of machine learning algorithm that allow us to perform complex, non-linear transformations of data without explicitly computing the transformed feature space. These methods are particularly useful when dealing with high-dimensional data or when the relationship between features is non-linear.

Kernel methods rely on the concept of a kernel function, which computes the dot product of two vectors in a transformed feature space without explicitly performing the transformation. This is known as the kernel trick. The kernel trick allows us to work in high-dimensional spaces efficiently, making it possible to solve complex problems that would be computationally infeasible otherwise.

Read on for the pros and cons of kernel methods and a pair of techniques that use them.

1 Comment

The Ephemeral Nature of Index Rebuilds on RCSI and ADR

Brent Ozar lays out an argument:

Accelerated Database Recovery (ADR) is a database-level feature that makes transaction rollbacks nearly instantaneous. Here’s how it works.

Without ADR, when you update a row, SQL Server copies the old values into the transaction log and updates the row in-place. If you roll that transaction back, SQL Server has to fetch the old values from the transaction log, then apply them to the row in-place. The more rows you’ve affected, the longer your transaction will take.

With ADR, SQL Server writes a new version of the row inside the table, leaving the old version in place as well.

Because you’re a smart cookie, you immediately recognize that storing multiple versions of a row inside the same table is going to cause a storage problem: we’re going to be boosting the size of our table, quickly. However, the problem’s even bigger than that, and it starts right from the beginning when we load the data.

This was an interesting analysis, looking at table growth with ADR + RCSI, with ADR or RCSI alone, and with neither feature. Given that I’m all-in on RCSI, this is particularly interesting to me. And if you want to dig really deeply into index maintenance, Jeff Moden has a fantastic set of presentations, which TriPASS recorded in 2021: GUIDs vs Fragmentation and LOB data. These two presentations help provide sound footing for deciding under what circumstances it makes sense to rebuild an index, and noting that (unless you’re Brent), the answer is probably “less often than you’d think.”

Comments closed

Case-Sensitive Elements in SQL Server Integration Services

Andy Brownsword has a list:

Case sensitivity in data can trip us up unknowingly and Integration Services is no exception. Depending on the components we’re using we may notice different behaviour.

Without understanding the way case sensitivity is treated within SSIS packages, we may see unexpected results with string comparisons.

Let’s clear things up and dive into a few examples to see when they behave differently.

Click through for three examples. One more that I’d add is Script components, which are C# or VB. Both of the languages are case sensitive, and string comparison in .NET is case sensitive unless you set StringComparison to one of the options that ignore case.

Comments closed

Converting Excel Dates and Times to SQL

Kristyna Ferris marks the date:

Hey data friends! This one comes from my personal vault (aka backlog of drafts I’ve been needing to write up) and is a really simple code that I always forget how to do. So, to save us all some ChatGPT-ing, here’s my tried-and-true way of converting Excel Date & Time fields to a true Date & Time in SQL.

Click through for an example of the process.

Comments closed

Read Local Files in SQL Server with PolyBase and MinIO

I have a new video:

In this video, I demonstrate how we can use PolyBase and MinIO to read files on a local machine in SQL Server.

This is one of the reasons I’m really happy that SQL Server introduced access to AWS S3 and S3-compatible storage with PolyBase in SQL Server 2022. The results are definitely slower than if you had direct file access, but it is possible.

Comments closed

Microsoft Fabric Item Ownership Takeover

Sakshi Jain has an announcement:

Today, when an item owner leaves the company, their credentials expire, or they lose access, many Fabric items cease to function. For example, Lakehouses and their SQL Endpoints become inoperative, Pipelines fail to execute due to user access errors. In these situations, enabling another user to assume ownership would ensure business continuity.

We are pleased to announce that Fabric users with the right permissions can now take ownership of Fabric items.

This is a big deal. for the same reason that we don’t want individual users to own databases in SQL Server, having individual users own objects in Fabric lakehouses and endpoints was always a risky play. At least now, there’s a way to handle when that person leaves the company.

Comments closed

Data Retention for Data in the Microsoft Fabric Lakehouse

Kenneth Omorodion clears out some data:

More than before, organizations now aim for a well-defined approach to manage their data storage effectively. Some reasons for this include operational efficiency, cost management, regulatory compliance, and strategic decision-making. In this article, I will describe an approach on data retention management​ for Lakehouse files to manage data storage when the data exists as files in the Fabric Lakehouse.

There’s nothing built in but Kenneth makes it easy.

Comments closed

Using the Azure SQL DB Query Editor

Josephine Bush writes a query:

I keep losing track of this wondering where it went. You have to access it at the database level. Adding this post to remind me for later. This came in very handy when my home internet went down and I couldn’t auth on my phone hotspot without timeouts in Azure Data Studio.

You can login in with SQL Server auth or Entra.

Read on for some notes about limitations. It is definitely a helpful tool for occasional queries or having a simpler way to access data without having to set up a VPN and a whole bunch of tools.

Comments closed

Natively Compiled Stored Procedures in SQL Server

Yvonne Vanslageren covers a point of frustration for me:

Modern applications often demand lightning-fast performance from their databases, whether they’re handling large transactional workloads or complex analytical queries. SQL Server’s in-memory OLTP feature addresses these needs by using memory-optimized tables and natively compiled stored procedures to boost throughput and reduce latency. This post provides an overview of natively compiled stored procedures, how to create them, and best practices for performance monitoring and maintenance.

My point of frustration is pretty simple: these things work really, really well. But they’re also so limited that I have never been able to use one in production. Memory-optimized tables are already so limited in good use cases, and natively compiled stored procedures have even more limitations, like using an awful collation (from the standpoint of humans working with the data) for string data.

Comments closed