Press "Enter" to skip to content

Category: T-SQL

Finding the First and Last Number of a String in SQL and Excel

Kevin Wilkie does some soul searching, or at least string searching:

To enjoy these puzzles, you will need to go to the Official Advent of Code website, sign up for their leaderboards and whatnot if you choose to, and then continue to 2023 and Day 1. Today, we’ll start with Day 1 – since it is the first of our programming puzzles and work our way up from there…

We’re asked – given a string – to find the first (and last) number in that string. We are then to concatenate them, add them all up, and provide the result. It should be pretty simple, but let’s see…

Click through for Kevin’s two answers.

Comments closed

Set-Based vs Row-Based Code Considerations

Kevin Hill explains a concept:

In SQL Server, the terms “set-based” and “row-based” refer to different approaches or styles of writing SQL code to manipulate data. These styles have implications for performance, readability, and the way queries are processed. Let’s explore the differences between set-based and row-based code:

Click through for Kevin’s thoughts. One thing I’d re-emphasize (because Kevin did make this point), especially for people coming to SQL Server from Oracle, is that set-based operations are going to be more efficient about 95-99% of the time than their row-based equivalents. Oracle has a large number of optimizations to make cursor-style code efficient and T-SQL has very few of those, as set-based is the more natural expression of SQL.

One quick example of this is, prior to SQL Server 2012 and its extended support of window functions, the fastest officially supported way to calculate a running total was to build a cursor. The other alternatives, including self-joins, were much less efficient. There was an unsupported but much faster technique that relied on a peculiarity of how SQL Server sorts clustered indexes (the “quirky update” method), but because it relied on internals that could change with any patch, it was a risky maneuver.

Comments closed

Microsoft Fabric and Tabular Editor

Johnny Winter is excited:

Why the excitement on my part? Well to take advantage of all the great features in Tabular Editor, you really need to be able to connect and write via XMLA, be that for doing CI/CD pipelines or by making edits directly on the dataset.

What great new features does Tabular Editor unlock that you can’t just do in the online Power BI modelling experience in Fabric… tons!

Read on to see how Tabular Editor plays with Microsoft Fabric.

Comments closed

Indexing for Substring Searches

Daniel Hutmacher prepares the bloom filter:

A question from a client got me thinking. Relational databases (at least the ones I know and love) can’t really index for queries that use LIKE queries for a substring of a column value. If you want to search for strings beginning with a given string, a regular rowstore index will have you covered. If you’re looking for entire words or sentences, a full text index might be a good call. But because of the very way indexes work, you’ll never get great performance searching for just arbitrary parts of a string.

So today I’ll put on my lab coat and do a little rocket surgery, just to prove to the world that it can be done.

The suffix tree approach was an interesting one. I’ve also seen people attack this problem using bloom filters (as I alluded to in the link text) and n-grams. A commenter does note n-grams (specifically, tri-grams) as a viable solution as well.

Comments closed

Self-Join Optimizations and Index Intersection

Daniel Hutmacher shows off a possibility:

This blog post started as a “what if” contemplation in my head: Suppose you have a reasonably large table with a clustered index and a number of non-clustered indexes. If your WHERE clause filters by multiple columns covered by those non-clustered indexes, could it potentially be faster to rewrite that WHERE clause to use those non-clustered indexes?

The answer might surprise you.

To be honest, when I learned about the concept originally, I expected that there would be a great deal of use cases for it. But SQL Server rarely comes up with this answer on its own and I think that’s because in most scenarios, we’d need to do additional sorting or other expensive operations to get the multiple indexes aligned just right to make this the faster option.

Comments closed

Combining Window Functions and GROUP BY

Andy Brownsword aggregates some data:

We revisited window functions last week for T-SQL Tuesday. As we’re in that area there’s another example I thought was worth exploring. Can we group data whilst applying window functions in the same query?

Andy comes up with a final query that works perfectly fine, but there’s actually an easier answer in terms of code readability: the DISTINCT operator.

SELECT DISTINCT FinancialQuarter, QuarterAvg = AVG(SalesValue) OVER (PARTITION BY FinancialQuarter), YearAvg = AVG(SalesValue) OVER (PARTITION BY FinancialYear)
FROM #MonthlySales;

The FinancialQuarter column is unique so we can perform the window operation for averaging sales value over financial quarter and then by financial year. To remove the “duplicate” rows, we run DISTINCT and get the same results.

That said, the execution plan for this is a little more complex, as we have to go through a lazy spool on two separate occasions rather than the one that Andy’s solution comes up with. For sufficiently large datasets, that could make a difference, so as usual, choose the option that works better for your situation.

Comments closed

T-SQL Tuesday 168 Round-Up

Steve Jones lagged a bit:

I didn’t get much of a chance to check out the posts as I was at the PASS Data Community Summit, but I came home and started to work through them.

This was the 8th one I’ve hosted, which makes sense as I’ve taken over managing the party from Adam Machanic and there have been a few places I’ve had to fill in for missing hosts. In any case, here’s the roundup. I’m going in order of the comments as I see them on the blog.

Click through for this month’s list of entrants.

Comments closed

Query Tuning via Window Function

Rob Farley eliminates a self-join:

Sometimes query tuning involves taking a different approach to a problem. And given that other tuning options might be creating index(es) or redesigning tables – both of which are much more permanent changes to an environment – rewriting a query can often be just right.

Window functions seem to pop up quite often when rewriting queries, and an example around this would be appropriate for this month’s T-SQL Tuesday, hosted by Steve Jones (@way0utwest at X/Twitter).

Read on for the all-too-common scenario and how Rob improves an existing query.

Comments closed

Last Observation Carried Forward in SQL Server 2022

Barney Lawrence shows off a nice enhancement to T-SQL in SQL Server 2022:

With SQL Server 2022 came a much requested additional feature added from the SQL standards – IGNORE_NULLS. You can probably guess what it does. Drop in IGNORE_NULLS after your function and you can blur the non null values over those gaps giving us results like this:

Read on for the pre-2022 version of the query and what it does, versus the version with IGNORE_NULLS specified. This small flag is extremely helpful in time series statistical analysis and I’m glad it’s in SQL Server now.

Comments closed

A Good Use of LEAD()

Chad Callihan gets the lead out:

Imagine we want to find gaps in post history for users. If a user is posting every day, we’re happy and can assume they’re happy. If they’re only posting once every few weeks, we want to investigate why that is.

Read on to see how you can do this with the LEAD() function.

Comments closed