2023-12-04 – Curated SQL

Predicting Forecast Errors of Ensemble Regression Models

Published 2023-12-04 by Kevin Feasel

Peter Laurinec builds a model to test a model:

In the last blog post about Multistep forecasting losses, I showed the usage of the fantastic method adam from the smooth R package on household electricity consumption data, and compared it with benchmarks.

Since I computed predictions from 10 methods/models for a long period of time, it would be nice to create some ensemble models for precise prediction for our household consumption data. For that purpose, it would be great to predict for example future errors of these methods. It is used in some known ensemble methods, which are not direct about stacking. Predicting errors can be beneficial for prediction weighting or for predicting the rank of methods (i.e. best one prediction). For the sake of learning something new, I will try multivariate regression models, so learning from multiple targets at once. At least, it has the benefit of simplicity, that we need only one model for all base prediction models.

Click through for Peter’s process. H/T R-Bloggers.

Comments closed

Set-Based vs Row-Based Code Considerations

Published 2023-12-04 by Kevin Feasel

Kevin Hill explains a concept:

In SQL Server, the terms “set-based” and “row-based” refer to different approaches or styles of writing SQL code to manipulate data. These styles have implications for performance, readability, and the way queries are processed. Let’s explore the differences between set-based and row-based code:

Click through for Kevin’s thoughts. One thing I’d re-emphasize (because Kevin did make this point), especially for people coming to SQL Server from Oracle, is that set-based operations are going to be more efficient about 95-99% of the time than their row-based equivalents. Oracle has a large number of optimizations to make cursor-style code efficient and T-SQL has very few of those, as set-based is the more natural expression of SQL.

One quick example of this is, prior to SQL Server 2012 and its extended support of window functions, the fastest officially supported way to calculate a running total was to build a cursor. The other alternatives, including self-joins, were much less efficient. There was an unsupported but much faster technique that relied on a peculiarity of how SQL Server sorts clustered indexes (the “quirky update” method), but because it relied on internals that could change with any patch, it was a risky maneuver.

Comments closed

Using Extended Events in Azure Data Studio

Published 2023-12-04 by Kevin Feasel

Josephine Bush tries it out:

I know I can use extended events (xevents) in Azure SQL DB when in SSMS, but I wanted to learn how to use them in Azure Data Studio (ADS).

Click through to see the normal workflow in SQL Server Management Studio, followed by the workflow in Azure Data Studio. I’d also recommend, at some point, finding good extended events sessions and saving the T-SQL to create them.

Comments closed

Generating Reports in Azure ML with Copilot

Published 2023-12-04 by Kevin Feasel

Soheil Bakhshi automates report creation:

In Nov 2023, Microsoft announced Microsoft Fabric’s general availability and Public Preview of Copilot in Microsoft Fabric. In a previous post, I explained what Copilot means to Power BI developers, which is valid for other Fabric developers such as data engineers and data scientists as Copilot for Fabric helps with those experiences as well. But the main focus of this blog post is to discuss the requirements, how to enable Copilot, and how to use it from a Power BI development point of view. So, this blog will not discuss other aspects of Copilot in Microsoft Fabric. With that, let’s begin.

I haven’t been particularly impressed with the reports it generates, but I suppose this is like the proverbial bear riding a unicycle: it’s not a question of how well it does the task that makes it interesting, but rather that it does it at all.

Comments closed

Grouping By Column Alias

Published 2023-12-04 by Kevin Feasel

Aaron Bertrand wants a feature:

GROUP BY queries can become overly convoluted if your grouping column is a complex expression. Because of the logical processing order of a query, you’re often forced to repeat such an expression since its alias can’t be used in the GROUP BY clause.

Oracle recently solved this in their 23c release by adding the ability to GROUP BY column_alias. This is such simple but powerful syntax, and I’m hoping we can get SQL Server to follow Oracle’s lead.

This would be a pretty nice feature. Admittedly, the workarounds aren’t that difficult, but this would be a nice quality of life update.

Comments closed

What Is Microsoft Fabric?

Published 2023-12-04 by Kevin Feasel

Tomaz Kastrun starts a new series:

Microsoft Fabric is a next-gen platform, that brings all-in-one data and analytics solution for end users, small, medium and large enterprises. Services offer the complete data cycle movement (data ingestion, data engineering, data integration, data storing with warehouse using one lake), delivering data insights and building predictive models.

Read on for the overview and stay tuned for plenty more where that came from.

Comments closed

Testing a New SSIS Extension

Published 2023-12-04 by Kevin Feasel

Andy Leonard breaks it down for us:

How do I test new SQL Server Integration Services (SSIS) extensions?

I have a collection of virtual servers that run locally on my laptop. SSIS extensions for Visual Studio 2019 and Visual Studio 2022 are now separate downloads (see the links provided).

Click through for Andy’s workflow.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Day: December 4, 2023

Predicting Forecast Errors of Ensemble Regression Models

Set-Based vs Row-Based Code Considerations

Using Extended Events in Azure Data Studio

Generating Reports in Azure ML with Copilot

Grouping By Column Alias

What Is Microsoft Fabric?

Testing a New SSIS Extension