April 2022 – Page 5 – Curated SQL

Calculating Running Totals with Window Functions

Published 2022-04-22 by Kevin Feasel

Steve Jones shows off a good use case for window functions:

Recently I was looking at some data and wanted to analyze it by month. I have a goal that is set for each day and then an actual value. I wanted to know how I was tracking against the goal, as a running total. If my goal is 10 a day, then I ought to actually get to 10 the first day, 20 for the second day (10 + 10), etc.

Read on to see how Steve solved the problem.

Comments closed

Building a 3D Regression Plane with a Scatter Plot in R

Published 2022-04-21 by Kevin Feasel

Sebastian Sauer embraces the third dimension:

Here’s the linear model with 2 predictors, giving us a model that can be visualized in 3D:
lm1 <- lm(mpg ~ hp + disp, data = mtcars)
As is standard, we’ll predict mpg.

Click through for a simple implementation using plotly.

Comments closed

Search in KQL

Published 2022-04-21 by Kevin Feasel

Robert Cain looks at the search operator in KQL:

In this post we will examine the KQL (Kusto Query Language) search operator. Search allows us to look across all columns in one or more tables for a specific text string.
The samples in this post will be run inside the LogAnalytics demo site found at https://aka.ms/LADemo. This demo site has been provided by Microsoft and can be used to learn the Kusto Query Language at no cost to you.

Click through to learn more about this very useful operator.

Comments closed

Entity Framework and Include Operations

Published 2022-04-21 by Kevin Feasel

Josh Darnell has a warning:

I can imagine someone reading that and not seeing the gravity of the situation. “Hey, 500 rows isn’t that many – we have modern hardware!”
I thought it was worth writing about a real world situation where this can get seriously out of hand.

Read on for a scenario in which 64 rows turns into 100,000 rows pretty quickly.

Comments closed

Quotation Marks in Azure SQL DB vs Managed Instance

Published 2022-04-21 by Kevin Feasel

Michael Bourgon notices a difference between Azure SQL DB and Managed Instance/box product:

I was trying to get some Xquery parsing working, and ran into a weird error:
"CONDITIONAL failed because the following SET options have incorrect settings: 'QUOTED_IDENTIFIER'"
Weird. Doubly so because this code has worked on-prem for a decade, and the Managed Instance is running it now.

Read on for the solution.

Comments closed

Using Buffer Pool Extension in SQL Server

Published 2022-04-21 by Kevin Feasel

Chad Callihan looks at buffer pool extension:

Perhaps you started out with X amount of memory when your SQL server was brought online and over time, with additional load and activity on that SQL server, users are not quite getting the type of performance they’re used to getting. Sure, you can buy more memory. What if that’s not an option?
If you’re running low on memory and need a little boost, enabling buffer pool extension can take advantage of an SSD as an “extension” for the buffer pool.

This is one of those interesting features that probably help a small number of customers but shouldn’t be generally useful. That’s because even with SSD performance improvements, memory is still a couple orders of magnitude faster, so as long as you have the ability to increase RAM, that brings much better performance.

Comments closed

The User Transaction Scope for Temporary Objects

Published 2022-04-21 by Kevin Feasel

Bob Dorr troubleshoots a performance problem:

When the temporary table is bound to the user transaction it is both created and destroyed as part of the transaction. The same logic in a procedure attempts to avoid the creation and destruction, for each execution, by using temporary table caching.
From the issue I was debugging, the user transaction scope mattered because creation and destruction of metadata may be an expensive operation.

This post ties into two separate things: first, how temp objects tie to specific sessions; and second, the cost of creating and destroying temporary objects. For the latter, a couple quick pieces of advice:

Reduce the number of temporary objects you create. If you can solve a problem with fewer temp tables or table variables while maintaining acceptable performance, that can help on busy systems.
Never explicitly drop temp tables. There’s no benefit to explicitly dropping temp tables, as they’ll go away as soon as the session ends. Also, not dropping temp tables is the first step to:
Embrace temp table reuse. There are specific rules around when you can re-use a temp table. Each re-use of a temp table means two fewer metadata operations (one delete and one create).
Use memory-optimized table variables instead of temp tables or table variables.
Turn on memory-optimized tempdb metadata. The biggest issue here is that you lose cross-database queries into tempdb views. That can end up being painful and is why I can’t recommend it as a general solution.

Comments closed

Simplifying a Complex Multi-Visual Chart

Published 2022-04-20 by Kevin Feasel

Amy Esselman re-designs a mess of a chart:

When faced with any unfamiliar but complicated graph, it can be helpful to think about it piece by piece to gain a better understanding of what’s being communicated. That way, we’ll have a better handle on how we can improve the overall visual.
The goal of this chart is to allow managers to compare their store’s performance against its forecasted range and the actual performance of other stores in the region.

Click through for the full process.

Comments closed

Iteratively Tuning Graph Neural Networks

Published 2022-04-20 by Kevin Feasel

Luis Bermudez takes us through the process of tuning one flavor of neural network:

We made our own implementations of OGB leaderboard entries for two popular GNN frameworks: GraphSAGE and a Relational Graph Convolutional Network (RGCN). We then designed and executed an iterative experimentation approach for hyperparameter tuning where we seek a quality model that takes minimal time to train. We define quality by running an unconstrained performance tuning loop, and use the results to set thresholds in a constrained tuning loop that optimizes for training efficiency.

Read on to see how they did it.

Comments closed

Consolidating Indexes

Published 2022-04-20 by Kevin Feasel

Erik Darling runs through an exercise:

The more columns you have in a table, the more potential column combinations there are for indexes. Much like columns, indexes tend to get added following the path of least resistance.
Very rarely does someone consider current indexes when deciding to add an index.

Erik’s process is a good one. The real pain comes when there are 40-50 indexes on a table (seriously…) and there are a lot of similar-but-not-quite-similar-enough indexes.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Month: April 2022