Press "Enter" to skip to content

Day: October 21, 2021

SQL User-Defined Functions in Spark SQL

Serge Rielau and Allison Wang announce a new type of user-defined function in Spark SQL:

SQL UDFs are simple yet powerful extensions to Spark SQL. As functions, they provide a layer of abstraction to simplify query construction – making SQL queries more readable and modularized. Unlike UDFs that are written in a non-SQL language, SQL UDFs are more lightweight for SQL users to create. SQL function bodies are transparent to the query optimizer thus making them more performant than external UDFs. SQL UDFs can be created as either temporary or permanent functions, be reused across multiple queries, sessions and users, and be access-controlled via Access Control Language (ACL). In this blog, we will walk you through some key use cases of SQL UDFs with examples.

I look forward to dealing with cardinality issues and performance tuning these things in 5 years.

Comments closed

On Avoiding NULL

Aaron Bertrand defends NULL:

A long time ago, I answered a question about NULL on Stack Exchange entitled, “Why shouldn’t we allow NULLs?” I have my share of pet peeves and passions, and the fear of NULLs is pretty high up on my list. A colleague recently said to me, after expressing a preference to force an empty string instead of allowing NULL:

“I don’t like dealing with nulls in code.”

I’m sorry, but that’s not a good reason. How the presentation layer deals with empty strings or NULLs shouldn’t be the driver for your table design and data model. And if you’re allowing a “lack of value” in some column, does it matter to you from a logical standpoint whether the “lack of value” is represented by a zero-length string or a NULL? Or worse, a token value like 0 or -1 for integers, or 1900-01-01 for dates?

Definitely read what Aaron has to say. I disagree with the tenor of his point enough that, now that I’m actually blogging again, I’ll have a post up tomorrow laying out the core of my disagreement. Stay tuned!


Building a Welcome Prompt for Powershell

Jeffrey Hicks makes Powershell more welcoming:

I realized it had been a while since I wrote a Friday Fun post. These posts are intended to demonstrate PowerShell in a fun and often non-practical way. The end result is generally irrelevant. The PowerShell scripting techniques and concepts I use are the real takeaways. The task is nothing more than a means to an end.

Today’s project is inspired by Linux. Specifically, the WSL Ubuntu installation I run in Windows Terminal. When I first launch it, I get a welcome screen like this.

I thought, why not do something similar for PowerShell?

Read on to see the result, which looks quite nice.

Comments closed

The Purpose of Powershell Providers

Robert Cain explains what providers do in Powershell:

Providers are an interesting concept in PowerShell. A provider allows us to use a standard method to communicate with different areas of your computer using the same set of commands. For example, the file system is accessed using a provider. When you use Get-ChildItem it goes through the file system provider to return a list of the files on your computer.

We’ll take a deeper look in a moment, but first let me mention that for all of the examples we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.5, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

Click through for a listing of several providers and more detail on two of them.

Comments closed

Scaling an Azure SQL Managed Instance

Arun Sirpal wants more power:

No doubt there will be times where you need to scale up the actual instance in terms of vCores but also you may want to move across tiers (for example General Purpose to Business Critical). If you remember a few blog posts ago I said it was really important to plan for these activities during the build phase, more specifically get the subnet range right. If you done that then you will be fine.

Click through for the process, though do note the amount of time it takes. One of the early ideals of cloud processing would be that you could seamlessly scale up and down with no effects on the end user. In some services (especially things like function apps, web apps, and VMs in a Kubernetes pod), you get that experience. When it comes to almost anything data-related, though, immediate scaling is a hard no, to the point where I’d assume you can’t afford the downtime to do it until proven otherwise.

Comments closed