September 2023 – Page 9

A Primer on Vector Search

Published 2023-09-13 by Kevin Feasel

Phil Booth takes a look at vector search systems:

Recently I built a system that uses vector search to logically truncate long documents and retain the most significant parts according to some search term. I’m a dummy, with no background in machine learning or mathematics, so there were new concepts for me to understand and implementation details to figure out. This post summarises what I learned.

Vector search and vector databases are becoming a fairly hot topic, so this at least grounds you on what they are.

Comments closed

An Apologia for SQL Trace

Published 2023-09-13 by Kevin Feasel

Rob Farley defends SQL Trace…sort of:

This month, Grant Fritchey (@scarydba) asks us to write about Extended Events (XE). He mentions that they were a topic once before (it was in 2015 – more than 8 years ago), and mentions about how they deserve more attention than that.

I kinda agree, but I kinda don’t.

Read on for Rob’s take on the matter.

Comments closed

The Joy of sp_HumanEvents

Published 2023-09-13 by Kevin Feasel

Erik Darling makes a pitch:

While my relationship with Extended Events is complicated for many reasons:

Awful documentation

Hardly any guidance on usage

Almost nothing useful about what type of target to use when

Everything stored in XML

Slow, unfriendly GUI in SSMS

My need to use them while consulting outweighs my gripes and grievances about how Microsoft has chosen to write about, use, and present the data to you.

That’s where my stored procedure sp_HumanEvents comes in handy.

In fairness, Erik put his virtual money where his virtual mouth is, and sp_HumanEvents is put together quite well.

Comments closed

Capturing Autogrowth Events in SQL Server

Published 2023-09-13 by Kevin Feasel

Ben Miller shares an extended event session with us:

I wanted to share one of the Extended Events I always put on a server when I am in charge of it. It has to do with File growths and captures some important things for me. Before you say that it is in the system_health extended events session, I know that it is there. I have had system_health sessions cycle pretty fast and there are a lot of other events in that trace, so I decided to make my own for just that specific thing so that I can archive the sessions and keep the disk clean as well as pull this information into a table and analyze data in a tabular way instead of mining XE files.

Read on for that script and what it does in practice.

Comments closed

Restoring Azure SQL DB Indexes

Published 2023-09-13 by Kevin Feasel

Brent Ozar answers a question:

I got an interesting request for consulting, and I’m going to paraphrase it:

We were using Azure SQL DB with automatic index tuning enabled for months. Things were going great, but… we just deployed a new version of our code. Our deployment tool made the database schema match our source control, which… dropped the indexes Azure had created. How do we get them back?

Read on for Brent’s answer.

Comments closed

A Primer on Latch Waits

Published 2023-09-13 by Kevin Feasel

Kendra LIttle gives us a sneak peek:

I’ve long found it tricky to remember and explain the differences between three similar-sounding waits in SQL Server that all have “LATCH” in the name: PAGELATCH, LATCH, and PAGEIOLATCH waits.

Here’s an illustration that explains these waits, along with wait subtypes.

This is an excerpt from my new comic, “Wait Stats in SQL Server.”

Click through for the excerpt, as well as some more detail on these latch types.

Comments closed

Creating a Moving Average Time Series in Power BI

Published 2023-09-13 by Kevin Feasel

Naiden Borimechkov builds a moving average:

We begin by populating a custom dataset containing real historical time series data for the top 1000 stocks in the US stock market for the last 2 years. We need the closing prices as well as the percentage changes for each of the stocks.

After creating a basic line chart, Naiden incorporates some DAX to build up the rolling averages of results.

Comments closed

Plotting SVM Decision Boundaries in R

Published 2023-09-12 by Kevin Feasel

Steven Sanderson goes right up to the edge:

Support Vector Machines (SVM) are a powerful tool in the world of machine learning and classification. They excel in finding the optimal decision boundary between different classes of data. However, understanding and visualizing these decision boundaries can be a bit tricky. In this blog post, we’ll explore how to plot an SVM object using the e1071 library in R, making it easier to grasp the magic happening under the hood.

Read on to see how you can perform this analysis as well.

Comments closed

Running Apache Kafka in Windows

Published 2023-09-12 by Kevin Feasel

Jim Galasyn gives up the ghost:

Is Windows your favorite development environment? Do you want to run Apache Kafka® on Windows? Thanks to the Windows Subsystem for Linux 2 (WSL 2), now you can, and with fewer tears than in the past. Windows still isn’t the recommended platform for running Kafka with production workloads, but for trying out Kafka, it works just fine. Let’s take a look at how it’s done.

There was a time in which running Kafka on Windows meant downloading Windows-specific installers, workaround executables to deal with NTFS, and all the attendant problems of being the third operating system on the list. Using WSL2 is definitely a better approach.

Comments closed

Finding Object Counts for S3 Buckets

Published 2023-09-12 by Kevin Feasel

The Big Data in Real World team sees a problem:

There is no separate command in AWS CLI to find the number of objects in an S3 bucket but there is a workaround.

Read on for the solution to this. The way that S3 and Azure Blob Storage (without hierarchical namespaces) store files as tags and treat folders as cosmetic is neat from a technical standpoint, though it goes counter to how we’d expect a file system to behave.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Month: September 2023