Press "Enter" to skip to content

Curated SQL Posts

DBScan for Clustering in Python

Brendan Tierney takes us through the DBScan algorithm:

Let’s illustrate the use of DBScan (Density Based Spatial Clustering of Applications with Noise), using the scikit-learn Python package, for a “manufactured” dataset. This example will illustrate how this density based algorithm works (See my other blog post which compares different Clustering algorithms for this same dataset). DBSCAN is better suited for datasets that have disproportional cluster sizes (or densities), and whose data can be separated in a non-linear fashion.

Click through for an interesting read on a dataset which is historically difficult to cluster (unless you know the general shape and translate everything to polar coordinates).

Comments closed

External Temp Tables and Plan Reuse

David Fowler has a warning about stored procedures which use temp tables created by other processes:

Here’s an interesting issue that recently came up. We were seeing very high compilations and recompilations on a server to the point that it started causing us some very serious issues (admittedly this wasn’t the sole issue but it was certainly a contributing factor, the other factors were also very interesting so I might look at those in another post).

After looking in the plan cache we could see a very high number of single use plans for a particular stored procedure. Now as you probably know, SQL will usually cache an execution plan and use it over and over whenever a particular query runs. SQL’s lazy and it doesn’t want to bother compiling queries unless it really has to.

So what was going on, why wasn’t SQL able to reuse the cached plan?

Read on for the solution, as well as the impact of the problem and ways to work around it.

Comments closed

Considerations when Upgrading to SQL Server 2019

Tom Collins has a checklist of things to consider before upgrading to SQL Server 2019:

Application Lifecycle – Each SQL Server version gets 10 years support. 5 years in mainstream support & 5 years in extended support 

                                    Mainstream support includes functional, performance, scalability and security updates.
                                    Extended support includes only security updates.

Analyse these support levels in the context of your organisations requirements . If the organization has a large footprint with a large multi-year upgrade cycle – than that will have different considerations to smaller scale

Read on for the full list.

Comments closed

TMSavePoint::GetProxyImpl Error with Power BI Premium Per User

Gilbert Quevauvilliers diagnoses an error:

I have been doing a lot of work recently using Power BI Premium Per User and recently I got the following error below when trying to update my fact table in Power Query.

“An unexpected error occurred (file ‘tmsavepoint.cpp’, line 1503, function ‘TMSavePoint::GetProxyImpl’)”

Read on to understand what this error means and how you can fix it. Do be sure to look out for the very important warning about 2/3 of the way in.

Comments closed

T-SQL Tuesday 143 Round-Up

John McCormack summarizes T-SQL Tuesday #143:

What an honour it was to host T-SQL Tuesday this month and I received some really great submissions. This wrap up post aims to give a quick insight into each of them in the hope that more members of the SQL Family can find some time to click on them and learn more. I counted 22 posts including my own which was a great response. If you missed the original invite, you can find the link below.

Click through for all of the responses.

Comments closed

Spill-Based tempdb Contention

Erik Darling wants you to use a coaster:

What was puzzling them was that nothing in the stored procedure did anything with temporary objects, cursors, or any of the other obvious things that go to el baño público. And the query plan itself was a pretty simple little loopy thing.

It wasn’t until I grabbed a quick sample of actual execution plans with Extended Events that the problem became more obvious.

Despite the fact that the query was only passing around 7 rows, we were getting a one row estimate, and there was a spilling Sort operator.

Click through for an example of what happened, as well as how you can fix it.

Comments closed

Session Windows in Spark Structured Streaming

Jungtaek Lim, et al, announce support for session windows in Spark Structured Streaming:

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. An input can only be bound to a single window.

Sliding windows are similar to the tumbling windows from the point of being “fixed-sized”, but windows can overlap if the duration of the slide is smaller than the duration of the window, and in this case, an input can be bound to the multiple windows.

Session windows have a different characteristic compared to the previous two types. Session window has a dynamic size of the window length, depending on the inputs. A session window starts with an input and expands itself if the following input has been received within the gap duration. A session window closes when there’s no input received within the gap duration after receiving the latest input. This enables you to group events until there are no new events for a specified time duration (inactivity).

Click through for more details. You could implement session windows when querying existing data using a gaps and islands approach (where you increment the island count when you have a lagged difference greater than the cutoff point), but for streaming scenarios, it’s very handy to have this as a native window type.

Comments closed

De-Cluttering a LIne Chart

Mike Cisneros trims the fat, and there’s a lot of it:

My colleague Alex was the first person I heard refer to the underlying structure of a graph as its “skeleton.” This includes all the features that give form and shape to your graph, but that ideally—like bones—we never even really notice or see. The skeleton frames and structures our graph, but doesn’t include the data itself. 

If we don’t take the time to organize the skeleton of a chart thoughtfully and intentionally, our data won’t be appropriately supported. To an audience, something will always seem a little bit amiss.

Click through for a Halloween-themed visual update.

Comments closed

Upcoming Power BI Improvements

Matt Allington looks at what’s soon-to-be-new in Power BI:

You may already be aware (but then again, maybe not) that Microsoft shares its plans for upcoming new features for the Power Platform every 6 months, and Power BI forms a subset of that plan. The next plan is called Power Platform Release Plan 2021 Wave 2. It takes a lot of planning to continuously improve software, keeping the current state working while adding new features. Part of this planning process is that Microsoft shares the big picture of what is coming.

I was reviewing the next release plan this week and wanted to share a couple of things that are coming that look exciting to me (as a user/developer of Power BI Reports). The 5 new features I love, and want to call out are listed below (all are pro features except the last one):

Click through for the list.

Comments closed