Press "Enter" to skip to content

Category: T-SQL

Calculating Exponential Moving Average in T-SQL

Rick Dobson watches the flow:

Exponential moving averages (emas) are a powerful means of detecting changes in time series data. However, if you are new to this task, you may be wondering how to choose from conflicting advice about how to calculate emas. This tip reviews several of the most popular methods for calculating moving averages. Additionally, this tip presents T-SQL code samples with common table expressions and stored procedures for generating emas from an underlying time series dataset.

“Emas don’t just track trends—they reveal momentum in motion.” That’s why they’re favored when recent values matter most—and why this tip focuses on helping you calculate them with precision.

Read on for the formula and a couple of lengthy scripts to generate it.

Leave a Comment

Comparing Sets of Data in T-SQL

Louis Davidson figured out which of these was not like the others, which of these just didn’t belong:

There are many occasions when we want to see the differences between two sets of data. Sometimes a whole table, a subset of a table, or even the results from a couple of queries, like in a unit test.

Maybe you want to see that two sets are exactly the same, for example domain table in DEV, PROD, or maybe even from source control. You might you have a orders table and an orders_history table and you want to see the overlap/changes over a given period of time, like for example, to clean out any useless history.

No matter what the reason, there is a query pattern that will work for you. In this blog I will demonstrate several of these techniques and why you might want to use them in different places.

Click through for those techniques. I am particularly fond of INTERSECT/EXCEPT because of how it handles missing data and typically performs quite well.

Leave a Comment

Random Number Generation in T-SQL via Marsaglia Polar Method

Sebastiao Pereira implements a method for generating random numbers in T-SQL:

Generating random numbers from a normal distribution is essential for accuracy and realistic modeling, simulation, inference, and algorithm design for scientific, engineering, statistical, and AI domains. How can we build a random number generator using Marsaglia Polar method in SQL Server without the use of external tools?

It’s an interesting technique that works well for drawing points from a two-dimensional space around a point.

Leave a Comment

Simple Data Quality Validation with T-SQL

Kenneth Omorodion builds a validation process:

As the need and use of data grows within any organization, there is a corresponding rising issue for the need of data quality validation. Most organizations have large stores of data but most of it are not managed efficiently in terms of data quality assurances, thus leading to inaccurate insights for the business which in turn leads to distrust in the data.

Organizations have now, more than ever, realized the importance of an efficient data quality process as part of their Business Intelligence and Analytics processes. The issue is, how can they implement data quality for their data? For larger and more data-centric organizations, they might be using pre-built data management and validation tools like Microsoft Purview or other Master Data Management tools like Informatica, Talend, SAP, Talend, and Stibo Systems. But for those organizations that can not commit to subscribing to pre-built options, or they are operating primarily on On-Premises environments, they might want to build one themselves, that’s where this article comes in.

I’m of two minds about this. One the one hand, I appreciate the effort that Kenneth put into this and expect that it would work reasonably well. On the other hand, I look at what it can do and say “Yes, but if you just use constraints like you should, then you don’t need this process.” It’s basically a very asynchronous way of applying check constraints, foreign key constraints, and specifying that an attribute is NOT NULL.

If there’s some reason why applying these constraints is impossible—such as receiving this data as an extract from some poorly-designed system—then this can do a good job of describing the failures of the upstream system. But this is where data architects need to get it right up-front.

1 Comment

Using Temporary Stored Procedures to Output Common Messages

Louis Davidson shows a neat use for temporary stored procedures:

On another connection (on another computer for that matter), I am right no doing some pretty long loads of some test data. The script is comprised of 6 queries, and they each may take 10 minutes (not completely sure, this is my first run of the scripts). And of course, I want to get some feedback on these queries to know how long they are taking.

One common way to do this is to put a PRINT statement between the queries so you can see the progress. But PRINT statements are notorious for one thing. Caching the output until the output buffer reaches some level.

One addition I’d make to Louis’s post is to make use of the FORMATMESSAGE() functionality that SQL Server 2016 introduced. This use case is right in its wheelhouse.

    SET @message = FORMATMESSAGE(N'%s%s%s%s',
			@Message,
			CASE
				WHEN @AddTimeToMessageFlag = 1 THEN CONCAT(N' : Message Time - ', SYSDATETIME())
				ELSE N''
			END,
			CASE
				WHEN @AddSpidToMessageFlag = 1 THEN CONCAT(N' : ProcessId - ', @@spid)
				ELSE N''
			END,
			CASE
				WHEN @AddOriginalLoginToOutputFlag = 1 THEN CONCAT(N' : LoggedInUserId - ', original_login())
				ELSE N''
			END); 

FORMATMESSAGE() provides a moderate benefit to readability versus a lengthy CONCAT(). And if you always wanted to emit all fields versus the optional setup that Louis has in place, FORMATMESSAGE() makes the result even clearer to understand.

SET @message = FORMATMESSAGE(N'%s : Message Time - %s : ProcessId - %i : LoggedInuserId - %s',
			@Message,
			CAST(SYSDATETIME() AS NVARCHAR(100)),
			@@spid,
			original_login()); 
2 Comments

Local Variables and Cardinality Estimates in SQL Server

Erik Darling lays out an explanation of what SQL Server does to estimate the cardinality of queries using local variables. Erik covers some of the conceptual questions around how a compiler could work, and then explains what SQL Server does.

Erik talks about dependency on specific behavior and how people get upset when things change, and he’s absolutely right. My favorite example of this is the “quirky update” technique people used back in the day prior to SQL Server 2012 to calculate running totals. That technique was something that relied on unsupported, accidental but repeatable behavior around creating clustered indexes on temp tables. Even though the SQL Server team explicitly stated that this behavior could change at any time, it was clear that they didn’t go out of their way to break that functionality.

Leave a Comment

Decimal Precision and Rounding in SQL Server

Jiri D. provides a warning:

Do you ever worry about how you declare NUMERIC or DECIMAL data types in SQL?
Do you sometimes “add a bit more” precision—just to be safe?
Have you considered how that small decision could actually change your arithmetic results?

I ran into this recently when comparing data between two environments that should have produced identical results. One calculated field was slightly off — and the culprit turned out to be a difference in numeric data type declarations.

Read on to see what happened. The differences weren’t massive, but if you were expecting an exact match, seeing a difference, even at the 7th or 8th spot after the decimal, could be jarring.

Leave a Comment

Adding Commas to Numeric Output in SQL Server

Andy Yun is speaking my language:

One thing that’s always driven me crazy is when I have large numbers in my resultsets and the lack of commas for readability. For a lot of different things I do, the more commas a number has, the more attention I want to give to a given value.

Andy shows examples of formatting to two and zero spots after the decimal, respectively. In a talk I give on analyzing business data with T-SQL, I also demonstrate how to show currency-based results:

FORMAT(SUM(o.Quantity * sih.LastCostPrice), N'$0,###.##') AS TotalCost

This starts each cost record with a dollar sign, ensures you have commas in the thousands spots, and have a two-digit decimal value. That would return back a result like $31,409,113.00, which is a lot easier to read than 31409113.

Leave a Comment