Press "Enter" to skip to content

Curated SQL Posts

Searching for Wildcard Characters in LIKE

Andy Brownsword is looking for a discount:

Performing a wildcard search by throwing a % into a LIKE expression is bread and butter. How do we handle this when we actually want to search for the wildcard though?

This was an issue I first saw early in my career before I was even building database solutions. The business had a back office solution where you could search through offers on the UI. Unfortunately it threw up some strange results.

Searching for ‘30%’ offers would return more than expected.

Read on for an example of the problem, as well as how you can resolve it.

Comments closed

A Primer on Vector Databases

Brendan Tierney gives us an overview of vector databases:

A Vector Database is a specialized database designed to efficiently store, search, and retrieve high-dimensional vectors, which are often used to represent complex data like images, text, or audio. Vector Databases handle the growing need for managing unstructured and semi-structured data generated by AI models, particularly in applications such as recommendation systems, similarity search, and natural language processing. By enabling fast and scalable operations on vector embeddings, vector databases play a crucial role in unlocking the power of modern AI and machine learning applications.

It’s interesting to see this pop up as a standalone database type (e.g., chromadb), though we’re also seeing some existing players like Postgres support vector database functionality via extension.

Comments closed

vCore-Based Subscription Limits for Azure SQL DB and Synapse Dedicated SQL Pools

Raj Tiwari announces a change in subscription limits:

New vCore based limits: The new limits will be based on vCores per Subscription per Region, which will be directly equivalent to DTU and DWU.

Default logical servers limit: The previous limits on Logical Server DTUs have been discontinued. All new and existing subscriptions will now have a default limit of 250 logical servers. 

Configurable vCore limits: Subscription vCore limits can now be easily managed through the support section on the Azure Portal, with approvals typically processed within minutes.

Read on to learn more about these limits and how you could extend them.

Comments closed

Migrating SQL Server Database Files between Storage Subsystems

Andy Yun does a bit of shuffling:

In my role at Pure Storage, I often engage with customers who wish to migrate their SQL Server databases off of their prior storage onto our hardware. And after some digging around for prior-published material, I was surprised to find that there really wasn’t much that was comprehensive. After all, one doesn’t change SANs too often. But when it does happen, it is nice to have some reference material from others who have. So I decided to try and give a good overview of how I’d approach the challenge.

This is meant to be a “food for thought” kind of post. I’m going to keep things somewhat high level, but will provide links to other blogs and material that can help you continue down whatever path you choose. And for simplicity, I’m going to limit this scope to a single SQL Server.

Read on for a few questions you should answer, followed by some notes and preferences. Andy’s filegroups tip is also a really good one.

Comments closed

Comparing Power BI Stacked Column Charts

Meagan Longoria compares and contrasts:

One of the new features in the August Power BI Desktop release is the updated legends that are styled to more accurately reflect the per-series formatting on the visual. This made me curious how close I could get to the clean look of a Deneb (vega-lite) stacked bar chart.

I used open source data from the Vega github repo and applied a few filters in Power BI. I used exactly the same source data in each chart, filtering down to five sites and 9 varieties. I chose a visual with a legend, data labels, and total labels to display the differences. We can argue about whether you should show a chart with this many labels another day.

Click through to see how the in-built visual compares to a custom visual of the same form.

Comments closed

Compressing Indexes and Shrinking Azure SQL MI Databases

Kendra Little has a good reason for an often-bad act:

Shrinking databases in SQL Server isn’t fun – it’s slow, it causes blocking if you forget to use the WAIT_AT_LOW_PRIORITY option, and sometimes it persistently fails and refuses to budge until you restart the instance. You only want to shrink a SQL Server database when you’ve got a good reason and a lot of patience.

If you’re using Azure SQL Managed Instance and you haven’t already used data compression on your indexes and shrunk your databases, you probably have two good reasons to do both of those things: performance and cost reduction.

Compressing indexes is very often (almost always?) a good thing. Shrinking databases is very often (again, almost always?) a bad thing. This is like a buddy cop movie for your database.

Kendra gives some good advice but also lays out a warning if you’re on General Purpose V1, so read the whole thing.

Comments closed

Random Forest Missing Data Imputation using missRanger

Michael Mayer handles missing data:

{missRanger} is a multivariate imputation algorithm based on random forests, and a fast version of the original missForest algorithm of Stekhoven and Buehlmann (2012). Surprise, surprise: it uses {ranger} to fit random forests. Especially combined with predictive mean matching (PMM), the imputations are often quite realistic.

This looks like an interesting package. At first, I thought it was a way of generating predictions outside the boundaries of training data and had concerns—a classic point (limitation?) of random forest as an algorithm is that it will not even try to predict values outside the range of what it sees in training data, so if the largest label is 10 and the smallest is 0, you won’t see a prediction of 11 or 50, no matter how you scale the inputs.

Instead of doing that, missRanger looks like it’s filling in missing data using a clever approach. That’s quite useful for dealing with incomplete data, a really common problem whose good solutions tend to be complex enough that people typically ignore them in favor of simple but less useful solutions like dropping rows altogether.

Comments closed

mssparkutils now notebookutils and Validating DAGs in Fabric

Sandeep Pawar gives us two quick hits:

First, if you haven’t noticed mssparkutils has been officially renamed to notebookutils. Check out the official documentation for details. Be sure to use/update your notebooks to notebookutils.

Read on for a pair of notes around this name change, as well as some capabilities to validate DAGs when using runMultiple to orchestrate multiple notebook executions.

Comments closed

Setting Row Label and Key Columns in Power BI

Chris Webb makes use of metadata:

A few weeks ago I wrote a post on how to improve the results you get from Power BI Copilot by editing the Linguistic Schema. As I mentioned, though, there are in fact lots of different ways that you as a Power BI semantic model developer can improve the results you get from Copilot and in this post I’ll show you another one: setting the Row Labels and Key Columns properties on a table.

Read on to see how these can affect results.

Comments closed