Press "Enter" to skip to content

Month: October 2021

SQL User-Defined Functions in Spark SQL

Serge Rielau and Allison Wang announce a new type of user-defined function in Spark SQL:

SQL UDFs are simple yet powerful extensions to Spark SQL. As functions, they provide a layer of abstraction to simplify query construction – making SQL queries more readable and modularized. Unlike UDFs that are written in a non-SQL language, SQL UDFs are more lightweight for SQL users to create. SQL function bodies are transparent to the query optimizer thus making them more performant than external UDFs. SQL UDFs can be created as either temporary or permanent functions, be reused across multiple queries, sessions and users, and be access-controlled via Access Control Language (ACL). In this blog, we will walk you through some key use cases of SQL UDFs with examples.

I look forward to dealing with cardinality issues and performance tuning these things in 5 years.

Comments closed

On Avoiding NULL

Aaron Bertrand defends NULL:

A long time ago, I answered a question about NULL on Stack Exchange entitled, “Why shouldn’t we allow NULLs?” I have my share of pet peeves and passions, and the fear of NULLs is pretty high up on my list. A colleague recently said to me, after expressing a preference to force an empty string instead of allowing NULL:

“I don’t like dealing with nulls in code.”

I’m sorry, but that’s not a good reason. How the presentation layer deals with empty strings or NULLs shouldn’t be the driver for your table design and data model. And if you’re allowing a “lack of value” in some column, does it matter to you from a logical standpoint whether the “lack of value” is represented by a zero-length string or a NULL? Or worse, a token value like 0 or -1 for integers, or 1900-01-01 for dates?

Definitely read what Aaron has to say. I disagree with the tenor of his point enough that, now that I’m actually blogging again, I’ll have a post up tomorrow laying out the core of my disagreement. Stay tuned!

2 Comments

Building a Welcome Prompt for Powershell

Jeffrey Hicks makes Powershell more welcoming:

I realized it had been a while since I wrote a Friday Fun post. These posts are intended to demonstrate PowerShell in a fun and often non-practical way. The end result is generally irrelevant. The PowerShell scripting techniques and concepts I use are the real takeaways. The task is nothing more than a means to an end.

Today’s project is inspired by Linux. Specifically, the WSL Ubuntu installation I run in Windows Terminal. When I first launch it, I get a welcome screen like this.

I thought, why not do something similar for PowerShell?

Read on to see the result, which looks quite nice.

Comments closed

The Purpose of Powershell Providers

Robert Cain explains what providers do in Powershell:

Providers are an interesting concept in PowerShell. A provider allows us to use a standard method to communicate with different areas of your computer using the same set of commands. For example, the file system is accessed using a provider. When you use Get-ChildItem it goes through the file system provider to return a list of the files on your computer.

We’ll take a deeper look in a moment, but first let me mention that for all of the examples we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.5, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

Click through for a listing of several providers and more detail on two of them.

Comments closed

Scaling an Azure SQL Managed Instance

Arun Sirpal wants more power:

No doubt there will be times where you need to scale up the actual instance in terms of vCores but also you may want to move across tiers (for example General Purpose to Business Critical). If you remember a few blog posts ago I said it was really important to plan for these activities during the build phase, more specifically get the subnet range right. If you done that then you will be fine.

Click through for the process, though do note the amount of time it takes. One of the early ideals of cloud processing would be that you could seamlessly scale up and down with no effects on the end user. In some services (especially things like function apps, web apps, and VMs in a Kubernetes pod), you get that experience. When it comes to almost anything data-related, though, immediate scaling is a hard no, to the point where I’d assume you can’t afford the downtime to do it until proven otherwise.

Comments closed

DBScan for Clustering in Python

Brendan Tierney takes us through the DBScan algorithm:

Let’s illustrate the use of DBScan (Density Based Spatial Clustering of Applications with Noise), using the scikit-learn Python package, for a “manufactured” dataset. This example will illustrate how this density based algorithm works (See my other blog post which compares different Clustering algorithms for this same dataset). DBSCAN is better suited for datasets that have disproportional cluster sizes (or densities), and whose data can be separated in a non-linear fashion.

Click through for an interesting read on a dataset which is historically difficult to cluster (unless you know the general shape and translate everything to polar coordinates).

Comments closed

External Temp Tables and Plan Reuse

David Fowler has a warning about stored procedures which use temp tables created by other processes:

Here’s an interesting issue that recently came up. We were seeing very high compilations and recompilations on a server to the point that it started causing us some very serious issues (admittedly this wasn’t the sole issue but it was certainly a contributing factor, the other factors were also very interesting so I might look at those in another post).

After looking in the plan cache we could see a very high number of single use plans for a particular stored procedure. Now as you probably know, SQL will usually cache an execution plan and use it over and over whenever a particular query runs. SQL’s lazy and it doesn’t want to bother compiling queries unless it really has to.

So what was going on, why wasn’t SQL able to reuse the cached plan?

Read on for the solution, as well as the impact of the problem and ways to work around it.

Comments closed

Considerations when Upgrading to SQL Server 2019

Tom Collins has a checklist of things to consider before upgrading to SQL Server 2019:

Application Lifecycle – Each SQL Server version gets 10 years support. 5 years in mainstream support & 5 years in extended support 

                                    Mainstream support includes functional, performance, scalability and security updates.
                                    Extended support includes only security updates.

Analyse these support levels in the context of your organisations requirements . If the organization has a large footprint with a large multi-year upgrade cycle – than that will have different considerations to smaller scale

Read on for the full list.

Comments closed

TMSavePoint::GetProxyImpl Error with Power BI Premium Per User

Gilbert Quevauvilliers diagnoses an error:

I have been doing a lot of work recently using Power BI Premium Per User and recently I got the following error below when trying to update my fact table in Power Query.

“An unexpected error occurred (file ‘tmsavepoint.cpp’, line 1503, function ‘TMSavePoint::GetProxyImpl’)”

Read on to understand what this error means and how you can fix it. Do be sure to look out for the very important warning about 2/3 of the way in.

Comments closed