2025-09-29 – Curated SQL

Scaling On-Prem Vector Search with Ollama and Nginx

Published 2025-09-29 by Kevin Feasel

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Click through for the solution.

Comments closed

Inspecting the Postgres Write-Ahead Log

Published 2025-09-29 by Kevin Feasel

Henrietta Dombrovskaya digs into the write-ahead log:

First, when the users fixed one of the primary suspects jobs, the situation with WAL growth didn’t change. Second, the rate of the growth couldn’t be explained by these suboptimal jobs: the data volumes they were removing and reloading were still magnitudes smaller than the WAL size we were dealing with. Finally, I decided to do what I should have done from the start – to take a look at what exactly was in these super-fast growing WALs.

Read on to learn what Henrietta found. Also check out the comments for some additional context.

Comments closed

Tokenization in SQL Server

Published 2025-09-29 by Kevin Feasel

Sebastiao Pereira demonstrates a combination of encryption and redirection to store sensitive data:

As privacy regulations tighten like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA), Payment Card Industry Data Security Standards (PCI DSS) organizations and more, there is an increased focus to protect sensitive information within databases. Tokenization is an option to adhere to those regulations. Let’s see how to implement SQL tokenization in SQL Server.

This is a reasonably clever solution, though if you need to search on any of the tokenized (i.e., encrypted and moved to a separate table) values, performance would be miserable. Even displaying the results for a moderately sized result set would run into serious performance issues. I suppose that if you, for some regulatory reason, need to keep these tokens stored elsewhere from the data, then you manage expectations the best you can.

Comments closed

Generating and Recognizing Hash Collisions in SQL Server

Published 2025-09-29 by Kevin Feasel

Hugo Kornelis continues a deep dive into hash tables in SQL Server:

Welcome back! In the previous parts, I first showed how a Hash Match (Left Outer Join) can give insight in the order of data in a hash table, and then used that trick to obtain and verify some interesting insights into the internal structure of such a table. It’s now time to see if this same trick can also be used to find hash collisions.

Hugo finds some interesting results along the way, so check it out.

Comments closed

SSMS and Large Text Columns

Published 2025-09-29 by Kevin Feasel

Rudy Rodarte learns a lesson:

Recently, I had to export some query results to CSV and Excel. One of the columns was extremely large query text which was defined as NVARCHAR(MAX). Here are some of the issues I faced with this request and how I over came them.

SSIS or the Import/Export wizard is fine. This is also a good use case for bcp or writing your own query using dbatools and outputting the results to file.

Comments closed

Losing Data with PostgreSQL and Jepsen

Published 2025-09-29 by Kevin Feasel

Jeremy Schneider performs some tests:

This is a follow‑up to the last article: Run Jepsen against CloudNativePG to see sync replication prevent data loss. In that post, we set up a Jepsen lab to make data loss visible when synchronous replication was disabled — and to show that enabling synchronous replication prevents it under crash‑induced failovers.

Since then, I’ve been trying to make data loss happen more reliably in the “async” configuration so students can observe it on their own hardware and in the cloud. Along the way, I learned that losing data on purpose is trickier than I expected.

Click through to learn more. Jepsen has been the gold standard in testing distributed database systems for data loss.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Day: September 29, 2025

Scaling On-Prem Vector Search with Ollama and Nginx

Inspecting the Postgres Write-Ahead Log

Tokenization in SQL Server

Generating and Recognizing Hash Collisions in SQL Server

SSMS and Large Text Columns

Losing Data with PostgreSQL and Jepsen