Press "Enter" to skip to content

Curated SQL Posts

Prior Year to a Specific Date in DAX

Alberto Ferrari lets us compare up to specific dates between years:

Unfortunately, the calculation is not perfect. At the year level, it compares the full previous year against an incomplete current year – in this example there are no sales after September 5th in the current year.

Besides, the problem appears not only at the year level, but also at the month level. Indeed, in September the Previous Year measure returns sales for the entire month of September in the previous year. The comparison is unfair, as there are only five days’ worth of sales in September of the current year.

Read on for a better technique.

Comments closed

Hot Patching Azure SQL Database

Hans Olav Norheim has an interesting paper on a technique Microsoft uses to release SQL Server patches for Azure SQL Database while minimizing downtime:

The SQL Engine we are running in Azure SQL Database is the very latest version of the same engine customers run on their own servers, except we manage and update it. To update SQL Server or the underlying infrastructure (i.e. Service Fabric or the operating system), we must stop the SQL Server process. If that process hosts the primary database replica, we move the replica to another machine (requiring a failover).
 
During failover, the database may be offline for a second and still meet our 99.995% SLA. However,  failover of the primary replica impacts workload because it aborts in-flight queries and transactions. We built features such as resumable index (re)build and accelerated database recovery to address these situations, but not all running operations are automatically resumable. It may be expensive to restart complex queries or transactions that were aborted due to an upgrade. So even though failovers are quick, we want to avoid them.

Read on to see how they do it. There’s no on-prem analogue yet, though perhaps that will come in time.

Comments closed

Immediate Streaming and the Power BI Data Gateway

Gilbert Quevauvilliers doesn’t have time to wait:

Currently as far as I understand it the On-Premise Data Gateway will wait and buffer some data before sending it through to the Power BI Service. By changing the setting below in the On-Premise Data Gateway, it will start streaming the data almost immediately.

I am fortunate enough to be really good mates with Phil Seamark who so part of the Power BI CAT team and he gave me a little nugget of gold that I would like to share with you.

Read on to see how to configure the gateway to stream immediately.

Comments closed

LSTM in Databricks

Vedant Jain shows us an example of solving a multivariate time series forecasting problem using LSTM networks:

LSTM is a type of Recurrent Neural Network (RNN) that allows the network to retain long-term dependencies at a given time from many timesteps before. RNNs were designed to that effect using a simple feedback approach for neurons where the output sequence of data serves as one of the inputs. However, long term dependencies can make the network untrainable due to the vanishing gradient problem. LSTM is designed precisely to solve that problem.

Sometimes accurate time series predictions depend on a combination of both bits of old and recent data. We have to efficiently learn even what to pay attention to, accepting that there may be a long history of data to learn from. LSTMs combine simple DNN architectures with clever mechanisms to learn what parts of history to ‘remember’ and what to ‘forget’ over long periods. The ability of LSTM to learn patterns in data over long sequences makes them suitable for time series forecasting.

This is a nice overview and as a bonus, there’s a notebook as well where you can try it on your own.

Comments closed

Plan Forcing in Query Store

Erin Stellato continues a series on why you should be interested in Query Store:

Before you go too crazy trying to stabilize performance with hundreds of forced plans, here are a couple things worth mentioning about plan forcing.

1. I think of plan forcing as a temporary solution. I do not view it as a “set it and forget it” option.
2. Use T-SQL for forcing and un-forcing (versus the UI) so you can track when it was done, and by whom, in change control. Nothing is entered in the ERRORLOG, or anywhere else, and there is currently no event that captures it.
3. Evaluate both the execution statistics (average and total) and the different plans for the query to determine the “best” plan to force.

Erin makes some great points in this post, so read the whole thing.

Comments closed

Problems with Pivoting

Itzik Ben-Gan wraps up an outstanding series:

When people want to pivot data using T-SQL, they either use a standard solution with a grouped query and CASE expressions, or the proprietary PIVOT table operator. The main benefit of the PIVOT operator is that it tends to result in shorter code. However, this operator has a few shortcomings, among them an inherent design trap that can result in bugs in your code. Here I’ll describe the trap, the potential bug, and a best practice that prevents the bug. I’ll also describe a suggestion to enhance the PIVOT operator’s syntax in a way that helps avoid the bug.

If you use the PIVOT operator, you definitely want to read this article.

Comments closed

Scaling Out Continuous Integration

Chris Adkin shows off parallelism in Azure DevOps continuous integration pipelines:

A SQL Server data tools project is checked out of GitHub, built into a DacPac, four containerized SQL Server instances are spun up using clones of the ‘Seed’ docker volume. The DacPac is applied to a database running inside each container, which a tSQLt test is then executed against, finally, at the end very end the tSQLt results are aggregate and published.

This is an interesting approach to the problem of lengthy tests: run them on several separate machines concurrently.

Comments closed

Renaming Multiple Columns with Power Query

Matt Allington shows how you can use M to rename all columns at once in a table in Power Query:

When you are using a matrix like this, it can be difficult to tell which “Year” column is coming from which table, as shown below.

One solution to this problem is to rename all the columns in each table by pre-pending Order or Delivery to the front of the existing column names.  Once that is done, it is much clearer which column is which.

Matt describes the concept for you, but also has a video showing how to do this.

Comments closed

Getting the Last Actual Plan

Grant Fritchey shows off an improvement in SQL Server 2019:

I’ve always felt responsible for making such a big deal about the differences between estimated and actual plans. I implied in the first edition of the execution plans book (get the new, vastly improved, 3rd edition in digital form for free here, or you can pay for the print version) that these things were so radically different that the estimated plan was useless. This is false. All plans are estimated plans. However, actual plans have some added runtime metrics. It’s not that we’re going to get a completely different execution plan when we look at an actual plan, it’s just going to have those very valuable runtime metrics. The problem with getting those metrics is, you have to execute the query. However, this is no longer true in SQL Server 2019 (CTP 2.4 and greater) thanks to sys.dm_exec_query_plan_stats

Click through for an example, as well as what you need to do to enable this.

Comments closed