Press "Enter" to skip to content

Curated SQL Posts

DATE_BUCKET() and an Alternative

Itzik Ben-Gan takes us through the DATE_BUCKET() function:

Bucketizing date and time data involves organizing data in groups representing fixed intervals of time for analytical purposes. Often the input is time series data stored in a table where the rows represent measurements taken at regular time intervals. For example, the measurements could be temperature and humidity readings taken every 5 minutes, and you want to group the data using hourly buckets and compute aggregates like average per hour. Even though time series data is a common source for bucket-based analysis, the concept is just as relevant to any data that involves date and time attributes and associated measures. For example, you might want to organize sales data in fiscal year buckets and compute aggregates like the total sales value per fiscal year. In this article, I cover two methods for bucketizing date and time data. One is using a function called DATE_BUCKET, which at the time of writing is only available in Azure SQL Edge. Another is using a custom calculation that emulates the DATE_BUCKET function, which you can use in any version, edition, and flavor of SQL Server and Azure SQL Database.

DATE_BUCKET() is something I’d like to see in the next version of SQL Server on-premises. There are some peculiarities to how it works and behavior isn’t always exactly what you’d expect, but it does accomplish what it sets out to do.

Comments closed

Diving into Prophet for Time Series Analysis

Dan Lantos continues a series on the Prophet library:

These plots give us a little insight into how the model is formed. The trend plot (top) exhibits a linear, piecewise function, with approximately appropriate values for our dataset throughout the years. This looks to be a baseline for predictions.
The weekly plot (middle) demonstrates some interesting behaviour – weekdays have a small negative impact on the predictions (approximately -50), and we see large spikes for the weekends. This appears peculiar, as we have no weekend data in our dataset, but it is a product of fitting a 7-day periodic function to only 5 days of data. Thankfully, this won’t be an issue as we have no need to forecast weekends.
The yearly plot (bottom) shows a much more volatile impact on predictions (-200 to +180) with frequent changepoints throughout. This points to a more sensitive and complex relationship between the time of year and the FTSE100 index than the day of the week.

If you’re already familiar with techniques like ARMA or ARIMA, this post will let you see immediately what the key differences are.

Comments closed

Logic Branching in Powershell

Robert Cain continues to have fun with Powershell:

In this post, we’ll see how some of PowerShell’s logic branching works. Logic branching is where you execute certain lines of code only when a condition is true.

For all of the examples, we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.3, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

As always, Robert has a large number of examples here to illuminate the process.

Comments closed

Finding Procedures Using SELECT *

Michael J. Swart hunts for the real performance killer:

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).

I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

Read on to see Michael’s strategy for attacking the problem while not including benign instances of it (such as WHERE EXISTS (SELECT * ...), which won’t cause any issues because the database engine doesn’t expand that wildcard).

Comments closed

Scaling ADF and Synapse Analytics Pipelines

Paul Andrew has a process for us:

Back in May 2020 I wrote a blog post about ‘When You Should Use Multiple Azure Data Factory’s‘. Following on from this post with a full year+ now passed and having implemented many more data platform solutions for some crazy massive (technical term) enterprise customers I’ve been reflecting on these scenario’s. Specifically considering:

– The use of having multiple regional Data Factory instances and integration runtime services.

– The decoupling of wider orchestration processes from workers.

Furthermore, to supplement this understanding and for added context, in December 2020 I wrote about Data Factory Activity Concurrency Limits – What Happens Next? and Pipelines – Understanding Internal vs External Activities. Both of which now add to a much clearer picture regarding the ability to scale pipelines for the purposes of large-scale extraction and transformation processes.

Read on for details about the scenario, as well as a design pattern to explain the process. This is a large solution for a large-scale problem.

Comments closed

Creating a REST API on SQL Server Data

Tomaz Kastrun needs to get at some data:

REST API (or RESTful API) is a application programming interface (abbreviated as API) that complies with REST architectural style and is created to allow communication, information flow and interaction with REST (Representational State Transfare) web services. API is used as an interface for communication between two or more programs (or machines) and provides content required from one side to another side (known as consumer and producer; call/request and response).

Getting data from and to database using API is great for building integration with application and retrieving to client all the information in JSON format only once, instead of every time an action occurs on frontend. Yes there are multiple ways to push data to frontend, it can be entity framework, it can be Dapper, it can be T-SQL with procedures layer, it can be REST API, and many many others.

Click through to see how Tomaz does it with Node.Js.

Comments closed

Excel Cube Functions and Lambdas for Grouping

Chris Webb continues a series on lambda helper functions in Excel:

In the last post in this series I showed how you can use Excel’s new Lambda helper functions to return tables. In this post I’ll show you how you can use them to return a dynamic array of CubeSet functions which can be used to build a histogram and do the kind of ABC-type analysis that can be difficult to do in a regular Power BI report.

Read on to see a pair of examples along these lines.

Comments closed

Rounding Times in SQL Server

Steve Stedman has the low-down on time rounding in SQL Server:

One thing that I end up having to look search on regularly is rounding of dates and times in Transact SQL, having looked this up too many times I finally realized that it is time for me to do my own blog post for it.

First off, whats the difference between rounding and truncating in these examples. Rounding rounds to the closest second, so 10:00:31 is rounded up to 10:01:00, and 10:00:29 is rounded down to 10:00:00. With truncation, it simple changes the truncated area to 0’s. so 10:00:31 gets truncated down to 10:00:00, and so does 10:00:59. Sometimes you may want rounding, and sometimes you may want truncation (floor) for your specific needs.

After having used date_trunc() in Postgres, I’d really like something similar in SQL Server.

Comments closed

Using Temporal Tables for Created and Updated Timestamps

Daniel Hutmacher has an interesting use case for temporal tables:

You have a table that you want to add “created” and “updated” timestamp columns to, but you can’t update the application code to update those columns. In the bad old times, you had to write a trigger to do the hard work for you. Triggers introduce additional complexity and potentially even a performance impact.

So here’s a nicer way to do it, trigger-free.

Click through for the solution, as well as a warning from Daniel.

Comments closed