Press "Enter" to skip to content

Category: T-SQL

SQL Server Regex

Dev Nambi has a new open-source project:

Databases store text, and the best way to manipulate text is to use a regular expression (‘regex’). Using regular expressions in SQL queries has been possible in many database engines for decades.

Now you can use regular expressions in SQL Server queries, too. I’ve created an open-source project, sql-server-regex, that lets you run regular expressions in T-SQL queries using scalar and table-valued functions.

This is a set of CLR functions which use the built-in .NET regular expressions functionality.  That makes it pretty easy to see how the code works.

Comments closed

A T-SQL Date Dimension

Vladimir Oselsky builds a date dimension in T-SQL:

Before we get into discussing how to create it date dimension and how to use it, first let’s talk about what it is and why do we need it. Depending on who you talk to, people can refer to this concept as “Calendar table” or “Date Dimension,” which is usually found in Data Warehouse. No matter how it is called, at the end of the day, it is a table in SQL Server which is populated with different date/calendar related information to help speed up SQL queries which require specific parts of dates.

In my case, I have created it to be able to aggregate data by quarters, years and month. Depending on how large your requirements are it will add additional complexity to building it. Since I don’t care about holidays (for now at least), I will not be creating holiday schedule which can be complicated to populate.

I love date dimensions, even on non-warehouse databases, because it’s an easy way of providing additional context to time series data.  Think about graphing orders per day in an industry with weekday-versus-weekend trends; a date dimension lets you strip out weekends (maybe plotting them separately) or even lets you build day-of-week analysis for each day, or looking at week of the month, etc.  You might also be interested in computing holidays.

Comments closed

STRING_AGG() Performance

Aaron Bertrand wants to know how the STRING_AGG() function performs:

We can see that our FORCESCAN hint really did make things worse – while we shifted the cost away from the clustered index seek, the sort was actually much worse, even though the estimated costs deemed them relatively equivalent. More importantly, we can see that STRING_AGG() does offer a performance benefit, whether or not the concatenated strings need to be ordered in a specific way. As with STRING_SPLIT(), which I looked at back in March, I am quite impressed that this function scales well prior to “v1.”

Given that the early releases tend to be “get the thing working” and later CTPs are around “make the thing faster,” it’s nice to see that STRING_AGG() is already ready for prime-time, and makes me wonder if they’ll make it even faster by RTM.

Comments closed

When Is NOLOCK Okay?

Erik Darling wants to figure out the acceptable boundaries for NOLOCK:

Picture a couple developers who started their app in the cloud, where they can’t get fancy with tempdb, fast disks aren’t in the budget yet, along with that beefier server with some extra RAM. They may not be able to turn on RCSI or SI at the drop of a hat; tempdb would keel over with the row versioning as part of a workload that already uses it pretty heavily.

They still need to run reports, either for users, or for higher ups at the company, and they can ask for them at any time. Caching the data when user activity is low and reporting against it when someone asks may raise some questions, like “why doesn’t my sales data show anything from today?”, or worse. You could invalidate the cache every X minutes, but that doesn’t help because then you need to re-run that reporting query every X minutes. That’s only moderately better than letting users query it at will.

Figuring out when read uncommitted is acceptable is a business decision.  As much as I dislike using NOLOCK, as long as the people on the business side understand the risks, it’s their call.

Comments closed

Pivoting Data

Jana Sattainathan explains the PIVOT operator:

The results are so much easier to look at and comprehend, aren’t they? All object types for a schema are on a single line and it is easy for us to do impact analysis visually.

Sometimes doing it in T-SQL is the best approach, but pivoting is generally something which is cheaper in the application tier, whether you’re building a report, dashboard, or web app.

Comments closed

New T-SQL Features

Slava Murygin looks at some new functions in the vNext CTP 1.1:

Since Microsoft introduced XML support in SQL Server, the most common string concatenation technique was use of “XML PATH(”)” like this:

SELECT SUBSTRING(
(SELECT ‘, ‘ + name FROM master.sys.tables
FOR XML PATH(”))
,3,8000);
GO

Now you can aggregate your strings by using function “STRING_AGG”:

SELECT STRING_AGG(name, ‘, ‘) FROM master.sys.tables;

Read on for the other three.  This aggregation function, however, would make some of my code a lot simpler and easier to explain to junior database developers.  I just want it to perform well is all.

Comments closed

Passing Parameters To SQL Queries Via Power BI

Chris Webb shows how to use the Value.NativeQuery() function to pass parameters to SQL Server queries:

It looks like, eventually, this will be the way that any type of ‘native’ query (ie a query that you write and give to Power Query, rather than a query that is generated for you) is run against any kind of data source – instead of the situation we have today where different M functions are needed to run queries against different types of data source. I guess at some point the UI will be updated to use this function. I don’t think it’s ‘finished’ yet either, because it doesn’t work on Analysis Services data sources, although it may work with other relational data sources – I haven’t tested it on anything other than SQL Server and SSAS. There’s also a fourth parameter for Value.NativeQuery() that can be used to pass data source specific options, but I have no idea what these could be and I don’t think there are any supported for SQL Server. It will be interesting to see how it develops over the next few releases.

It’s good to know that you can parameterize queries now.

Comments closed

Truncate Table And Stats

Kendra Little shows that TRUNCATE TABLE does not always reset stats:

You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.

But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.

Check it out.

Comments closed

Lists And Ranges In Powershell And T-SQL

Phil Factor has an interesting post on lists and ranges in two languages:

When a  list like ‘1,3,5,6,9’, or ’12 Jan 2016,14 Jan 2016, 18 Feb 2016’  contains a datatype that can be unambiguously sorted in the order of the values of the datatype, it becomes possible to imply a range. This will trim unwieldy lists significantly if they have a lot of contiguous values.  ‘1,2,3,4,5’ can be expressed as 1-5 (or 1..5). The starting integer is separated from the end integer in the range by a dash sign. This representation rather clashes with the minus sign, represented by the same symbol, but the comma removes the ambiguity. A number followed immediately by a ‘-‘ means that the ‘-‘is a range symbol.  As with SQL’s BETWEEN clause that selects rows, the range includes all the integers in the interval including both endpoints. Because the range syntax is intended to be a more compact form,  it is generally only used where there are at least three contiguous values.

Interesting article.  I recommend checking it out.

Comments closed

OUTPUT With Identity Columns

Kenneth Fisher walks through a scenario trying to archive data using the OUTPUT clause, but where there is an identity column involved:

So what’s the problem? Well, to start with by using the SELECT TOP 0 INTO Archive method of creating the Archive table you’ve created the archive table with an identity column. Get rid of that identity property (you still want the column) and you don’t have a problem. But the really cool part of this is that it reminded me that the OUTPUT clause has a field list.

Also check out the comments for another solution.

Comments closed