T-SQL – Page 49 – Curated SQL

Optimizing String Split and Search

Published 2021-08-26 by Kevin Feasel

Daniel Hutmacher needs things to go faster:

One of the things that sp_ctrl3 does is plaintext database search. If you pass a string to the procedure that does not match an existing object, it’ll just perform a plaintext search of all SQL modules (procedure, views, triggers, etc) for that string. The search result includes line numbers for each result, so it needs to split each module into lines.
I’ve found that this takes a very long time to run in a database with large stored procedures, so here’s how I tuned it to run faster.

Read the whole thing.

Comments closed

Getting Distinct Values before STRING_AGG

Published 2021-08-24 by Kevin Feasel

Greg Dodd shows how to remove duplicate values from a list before passing them to the STRING_AGG() function:

SQL introduced the new STRING_AGG feature in SQL 2017, and it works just like it suggests it would: it’s an aggregate function that takes all of the string values and joins them together with a separator. To see how it works, I’m using the StackOverflow users table, and let’s say we want to create a list of Display Names and we’re going to group it based on Location:

Click through for two methods, one of which is considerably better than the other.

Comments closed

From API Call to ML Services Prediction

Published 2021-08-20 by Kevin Feasel

Tomaz Kastrun continues a series:

From the previous two blog posts:
– Creating REST API for reading data from Microsoft SQL Server in web browser
– Writing Data to Microsoft SQL Server from web browser using REST API and node.js
We have looked into the installation process of Node.js, setup of Microsoft SQL Server and made couple of examples on reading the data from database through REST API and how to insert data back to database.
In this post, we will be looking the R predictions using API calls against a sample dataset.

Click through to see it in action.

Comments closed

Eliminating False Positives when Searching for NOLOCK

Published 2021-08-19 by Kevin Feasel

Aaron Bertrand carries a lantern around Athens, searching for an honest NOLOCK:

In Part 1 of this series, I showed how to identify “NOLOCK in DML” patterns within a single statement using a Visitor pattern. In Part 2, I showed how to find those patterns in procedure and trigger bodies across multiple databases and instances. I still need to show how to eliminate false positives without tedious visual inspection.

Read on to see how we can narrow down the likelihood of a false positive when searching for the worst sorts of NOLOCK queries.

Comments closed

Joining to STRING_SPLIT

Published 2021-08-18 by Kevin Feasel

Kevin Wilkie explains that the STRING_SPLIT() function isn’t something one simply joins to:

My friends! Last time together, we discussed using the STRING_SPLIT function and how it’s used in combination with the CROSS APPLY.
First off, most of us are used to working with an INNER JOIN instead of CROSS APPLY. Well, you’re not going to be able to use an INNER JOIN when you’re using the STRING_SPLIT function.

Read on for a demonstration.

Comments closed

DATE_BUCKET() and an Alternative

Published 2021-08-11 by Kevin Feasel

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed

Finding Procedures Using SELECT *

Published 2021-08-10 by Kevin Feasel

Michael J. Swart hunts for the real performance killer:

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).
I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

Read on to see Michael’s strategy for attacking the problem while not including benign instances of it (such as WHERE EXISTS (SELECT * ...), which won’t cause any issues because the database engine doesn’t expand that wildcard).

Comments closed

The Downside of CASE Expressions in WHERE Clauses

Published 2021-08-10 by Kevin Feasel

Erik Darling explains a common performance problem:

Obviously, this extends to join clauses as well. A case expression there has just as much chance of causing performance issues.
This pattern crops up in some ORM queries that I’ve seen as well, but I haven’t been able to figure out the code that causes it.
Let’s look at why this causes problems!

Let us, shall we?

Comments closed

Generating Mock Data for SQL Server

Published 2021-08-09 by Kevin Feasel

Chad Callihan has a few options for creating fake data:

It’s easy enough to create a handful of records for testing in SQL Server. What if you want 100 rows or 1000 rows? What if you want data that looks more legitimate compared to gibberish? In this post, we’ll look at different ways to generate mock data.

One of the trickiest things about creating mock data is getting the distributions right. For example, ABS(CHECKSUM(NEWID()) is great (just as RAND(CHECKSUM(NEWID())), but the results follow a uniform distribution because of the nature of checksums and random number generators. This makes charting numeric values look unnatural. Here’s an example I put together of generating data off of a normal distribution. It does take more effort, but if you’re generating this fake data to show it to users in tools like Power BI or Tableau, having data follow reasonable distributions is a good thing. That is, use whatever distribution makes sense for the particular data element: uniform, normal, Pareto (power law), gamma, etc.

Comments closed

The Basics of Table Variables

Published 2021-08-06 by Kevin Feasel

Jared Poche explains some of the value of table variables:

In having a talk reviewed recently, it was suggested I spend more time defining some of the subject I touched on. It occurred if I should go over (or at least introduce) these ideas during a talk for a SQL Saturday audience, some might find a post on the subject useful. Hence my recent post on key lookups.
Another such topic is table variables. I use table variables frequently at my current job, but they came up very infrequently when I worked at CSS in Microsoft. I remember the conversations about them being very simple at the time, as in, “you should just use temp tables instead.” But there is a lot of utility with table variables, and they could be a useful arrow in your quiver.

Click through for the full story.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: T-SQL