Category: T-SQL

From API Call to ML Services Prediction

Published 2021-08-20 by Kevin Feasel

From the previous two blog posts:
– Creating REST API for reading data from Microsoft SQL Server in web browser
– Writing Data to Microsoft SQL Server from web browser using REST API and node.js
We have looked into the installation process of Node.js, setup of Microsoft SQL Server and made couple of examples on reading the data from database through REST API and how to insert data back to database.
In this post, we will be looking the R predictions using API calls against a sample dataset.

Click through to see it in action.

Comments closed

Eliminating False Positives when Searching for NOLOCK

Published 2021-08-19 by Kevin Feasel

Aaron Bertrand carries a lantern around Athens, searching for an honest NOLOCK:

In Part 1 of this series, I showed how to identify “NOLOCK in DML” patterns within a single statement using a Visitor pattern. In Part 2, I showed how to find those patterns in procedure and trigger bodies across multiple databases and instances. I still need to show how to eliminate false positives without tedious visual inspection.

Read on to see how we can narrow down the likelihood of a false positive when searching for the worst sorts of NOLOCK queries.

Comments closed

Joining to STRING_SPLIT

Published 2021-08-18 by Kevin Feasel

Kevin Wilkie explains that the STRING_SPLIT() function isn’t something one simply joins to:

My friends! Last time together, we discussed using the STRING_SPLIT function and how it’s used in combination with the CROSS APPLY.
First off, most of us are used to working with an INNER JOIN instead of CROSS APPLY. Well, you’re not going to be able to use an INNER JOIN when you’re using the STRING_SPLIT function.

Read on for a demonstration.

Comments closed

DATE_BUCKET() and an Alternative

Published 2021-08-11 by Kevin Feasel

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed

Finding Procedures Using SELECT *

Published 2021-08-10 by Kevin Feasel

Michael J. Swart hunts for the real performance killer:

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).
I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

Read on to see Michael’s strategy for attacking the problem while not including benign instances of it (such as WHERE EXISTS (SELECT * ...), which won’t cause any issues because the database engine doesn’t expand that wildcard).

Comments closed

The Downside of CASE Expressions in WHERE Clauses

Published 2021-08-10 by Kevin Feasel

Erik Darling explains a common performance problem:

Obviously, this extends to join clauses as well. A case expression there has just as much chance of causing performance issues.
This pattern crops up in some ORM queries that I’ve seen as well, but I haven’t been able to figure out the code that causes it.
Let’s look at why this causes problems!

Let us, shall we?

Comments closed

Generating Mock Data for SQL Server

Published 2021-08-09 by Kevin Feasel

Chad Callihan has a few options for creating fake data:

It’s easy enough to create a handful of records for testing in SQL Server. What if you want 100 rows or 1000 rows? What if you want data that looks more legitimate compared to gibberish? In this post, we’ll look at different ways to generate mock data.

One of the trickiest things about creating mock data is getting the distributions right. For example, ABS(CHECKSUM(NEWID()) is great (just as RAND(CHECKSUM(NEWID())), but the results follow a uniform distribution because of the nature of checksums and random number generators. This makes charting numeric values look unnatural. Here’s an example I put together of generating data off of a normal distribution. It does take more effort, but if you’re generating this fake data to show it to users in tools like Power BI or Tableau, having data follow reasonable distributions is a good thing. That is, use whatever distribution makes sense for the particular data element: uniform, normal, Pareto (power law), gamma, etc.

Comments closed

The Basics of Table Variables

Published 2021-08-06 by Kevin Feasel

Jared Poche explains some of the value of table variables:

In having a talk reviewed recently, it was suggested I spend more time defining some of the subject I touched on. It occurred if I should go over (or at least introduce) these ideas during a talk for a SQL Saturday audience, some might find a post on the subject useful. Hence my recent post on key lookups.
Another such topic is table variables. I use table variables frequently at my current job, but they came up very infrequently when I worked at CSS in Microsoft. I remember the conversations about them being very simple at the time, as in, “you should just use temp tables instead.” But there is a lot of utility with table variables, and they could be a useful arrow in your quiver.

Click through for the full story.

Comments closed

Ways to avoid the MERGE Operator

Published 2021-08-05 by Kevin Feasel

Michael J. Swart has important bullet points:

Aaron Bertrand has a post called Use Caution with SQL Server’s MERGE Statement. It’s a pretty thorough compilation of all the problems and defects associated with the MERGE statement that folks have reported in the past. But it’s been a few years since that post and in the spirit of giving Microsoft the benefit of the doubt, I revisited each of the issues Aaron brought up.
Some of the items can be dismissed based on circumstances. I noticed that:
– Some of the issues are fixed in recent versions (2016+).
– Some of the issues that have been marked as won’t fix have been fixed anyway (the repro script associated with the issue no longer fails).
– Some of the items are complaints about confusing documentation.
– Some are complaints are about issues that are not limited to the MERGE statement (e.g. concurrency and constraint checks).

Spoilers: some + some + some + some is still a lot less than all. Read the whole thing.

Comments closed

Analytic Window Functions in SQL Server

Published 2021-08-03 by Kevin Feasel

Aveek Das takes a look at a few analytic window functions available since SQL Server 2012:

Since the introduction of SQL Server 2012, the analytic functions were added to the SQL Server database engine. Any version of SQL Server after SQL Server 2012 can execute analytic queries on it. These functions are used to calculate an aggregated value from the dataset but are based on a specific set of rows instead of the entire dataset. As compared to aggregate functions like SUM, COUNT, AVG, etc. which return scalar records, these functions can return multiple records based on the conditions. The most common examples of using these functions are to find moving averages, running totals, etc. SQL Server supports the following analytic functions.
1. CUME_DIST – Find the cumulative distribution of a numerical column
2. FIRST_VALUE – Finds the first value of a column from the group and prints the same for each row
3. LAST_VALUE – Finds the last value of a column from the group and prints the same for each row
4. LAG – Reads values after the specified number of rows for a column
5. LEAD – Reads values before the specified number of rows for a column

Click through for examples of how each works.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31