Press "Enter" to skip to content

Curated SQL Posts

Scaling an Azure SQL Managed Instance

Arun Sirpal wants more power:

No doubt there will be times where you need to scale up the actual instance in terms of vCores but also you may want to move across tiers (for example General Purpose to Business Critical). If you remember a few blog posts ago I said it was really important to plan for these activities during the build phase, more specifically get the subnet range right. If you done that then you will be fine.

Click through for the process, though do note the amount of time it takes. One of the early ideals of cloud processing would be that you could seamlessly scale up and down with no effects on the end user. In some services (especially things like function apps, web apps, and VMs in a Kubernetes pod), you get that experience. When it comes to almost anything data-related, though, immediate scaling is a hard no, to the point where I’d assume you can’t afford the downtime to do it until proven otherwise.

Comments closed

DBScan for Clustering in Python

Brendan Tierney takes us through the DBScan algorithm:

Let’s illustrate the use of DBScan (Density Based Spatial Clustering of Applications with Noise), using the scikit-learn Python package, for a “manufactured” dataset. This example will illustrate how this density based algorithm works (See my other blog post which compares different Clustering algorithms for this same dataset). DBSCAN is better suited for datasets that have disproportional cluster sizes (or densities), and whose data can be separated in a non-linear fashion.

Click through for an interesting read on a dataset which is historically difficult to cluster (unless you know the general shape and translate everything to polar coordinates).

Comments closed

External Temp Tables and Plan Reuse

David Fowler has a warning about stored procedures which use temp tables created by other processes:

Here’s an interesting issue that recently came up. We were seeing very high compilations and recompilations on a server to the point that it started causing us some very serious issues (admittedly this wasn’t the sole issue but it was certainly a contributing factor, the other factors were also very interesting so I might look at those in another post).

After looking in the plan cache we could see a very high number of single use plans for a particular stored procedure. Now as you probably know, SQL will usually cache an execution plan and use it over and over whenever a particular query runs. SQL’s lazy and it doesn’t want to bother compiling queries unless it really has to.

So what was going on, why wasn’t SQL able to reuse the cached plan?

Read on for the solution, as well as the impact of the problem and ways to work around it.

Comments closed

Considerations when Upgrading to SQL Server 2019

Tom Collins has a checklist of things to consider before upgrading to SQL Server 2019:

Application Lifecycle – Each SQL Server version gets 10 years support. 5 years in mainstream support & 5 years in extended support 

                                    Mainstream support includes functional, performance, scalability and security updates.
                                    Extended support includes only security updates.

Analyse these support levels in the context of your organisations requirements . If the organization has a large footprint with a large multi-year upgrade cycle – than that will have different considerations to smaller scale

Read on for the full list.

Comments closed

TMSavePoint::GetProxyImpl Error with Power BI Premium Per User

Gilbert Quevauvilliers diagnoses an error:

I have been doing a lot of work recently using Power BI Premium Per User and recently I got the following error below when trying to update my fact table in Power Query.

“An unexpected error occurred (file ‘tmsavepoint.cpp’, line 1503, function ‘TMSavePoint::GetProxyImpl’)”

Read on to understand what this error means and how you can fix it. Do be sure to look out for the very important warning about 2/3 of the way in.

Comments closed

T-SQL Tuesday 143 Round-Up

John McCormack summarizes T-SQL Tuesday #143:

What an honour it was to host T-SQL Tuesday this month and I received some really great submissions. This wrap up post aims to give a quick insight into each of them in the hope that more members of the SQL Family can find some time to click on them and learn more. I counted 22 posts including my own which was a great response. If you missed the original invite, you can find the link below.

Click through for all of the responses.

Comments closed

Spill-Based tempdb Contention

Erik Darling wants you to use a coaster:

What was puzzling them was that nothing in the stored procedure did anything with temporary objects, cursors, or any of the other obvious things that go to el baño público. And the query plan itself was a pretty simple little loopy thing.

It wasn’t until I grabbed a quick sample of actual execution plans with Extended Events that the problem became more obvious.

Despite the fact that the query was only passing around 7 rows, we were getting a one row estimate, and there was a spilling Sort operator.

Click through for an example of what happened, as well as how you can fix it.

Comments closed

Session Windows in Spark Structured Streaming

Jungtaek Lim, et al, announce support for session windows in Spark Structured Streaming:

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. An input can only be bound to a single window.

Sliding windows are similar to the tumbling windows from the point of being “fixed-sized”, but windows can overlap if the duration of the slide is smaller than the duration of the window, and in this case, an input can be bound to the multiple windows.

Session windows have a different characteristic compared to the previous two types. Session window has a dynamic size of the window length, depending on the inputs. A session window starts with an input and expands itself if the following input has been received within the gap duration. A session window closes when there’s no input received within the gap duration after receiving the latest input. This enables you to group events until there are no new events for a specified time duration (inactivity).

Click through for more details. You could implement session windows when querying existing data using a gaps and islands approach (where you increment the island count when you have a lagged difference greater than the cutoff point), but for streaming scenarios, it’s very handy to have this as a native window type.

Comments closed

De-Cluttering a LIne Chart

Mike Cisneros trims the fat, and there’s a lot of it:

My colleague Alex was the first person I heard refer to the underlying structure of a graph as its “skeleton.” This includes all the features that give form and shape to your graph, but that ideally—like bones—we never even really notice or see. The skeleton frames and structures our graph, but doesn’t include the data itself. 

If we don’t take the time to organize the skeleton of a chart thoughtfully and intentionally, our data won’t be appropriately supported. To an audience, something will always seem a little bit amiss.

Click through for a Halloween-themed visual update.

Comments closed