Press "Enter" to skip to content

Curated SQL Posts

Use Cases for Window Functions

I continue a series on window functions in SQL Server:

To this point, we’ve looked at five classes of window function in SQL Server. I’ve given you a couple of solid use cases, but for the most part, we’ve focused on what the classes of window functions are. Now we’re going to talk about why you want to use them.

As is the norm for me, it takes about seven posts before I explain why you might want to do something.

Comments closed

Data Processing in Data Explorer Pools

Tsuyoshi Matsuzaki shows us how Data Explorer pools work in Azure Synapse Analytics:

In Microsoft Ignite 2021, new Data Explorer (DX) pool in Azure Synapse Analytics is released in preview. You might wonder which one to choose among 3 different analytical pools – Spark pool, Dedicated SQL pool, and DX pool.

In this post, I’ll briefly summarize how data is processed in Data Explorer (Kusto) – Azure Data Explorer (ADX) and Azure Synapse Data Explorer (DX) pool.
I hope this will give you a hint for your optimal analytical platform.

Read on for this explanation.

Comments closed

CONVERT and Binary Styles

Abayomi Obawomiye has style:

I recently had a requirement to load some data from a source table to another destination table. The destination columns were exactly the same as the source columns with the same data types and length. The only difference was that some columns on the destination table must be encrypted. The task was to use the SHA2_512 encryption algorithm to encrypt the “sensitive” data. I will talk more about the encryption algorithm in another post.

To achieve this, I needed to use the HASHBYTES function in SQL Server. The challenge was that this function used with the SHA2_512 encryption algorithm will return a fixed character length of 64 characters which will be longer than the character length on my destination table. As a result, SQL Server will throw a truncation error. I will demonstrate this below.

One really important point: SHA is not encryption; it’s a hash (which is why the function is HASHBYTES() instead of something like EncryptByKey() as column-level security uses). Hashes are intended to be a one-way trip, whereas encryption implies an ability to decrypt if you have the relevant key details. Here, the use looks to be obfuscating the text of sensitive data fields, perhaps for loading in a dev/test environment, and so the actions themselves are quite reasonable.

By the way, the styles Abayomi talks about are all listed in this Docs page. Turns out that if you’re using a money datatype, you can use CONVERT() to display the end result with commas.

Comments closed

Checking Guest Account Status

Tom Collins does not want guests barging into SQL Server unannounced:

A common  SQL Server Security check is to identify if the  guest id is enabled.        By default  , the SQL Server  guest id is disabled , but for various reasons it can become enabled. 

To identify the guest status you can either use the SQL Server Management Studio (SSMS) or through SQL code. 

Click through to see how. I don’t think I’ve ever seen a use case for having the guest account enabled on a SQL Server instance.

Comments closed

Powershell Functions and Return Type Oddities

Dave Mason goes down the rabbit hole:

As a result of some struggles trying to automate a process, I’ve learned some things about PowerShell. After getting to the bottom of a time-consuming problem, I thought it worth a blog post that might save someone else some time and heartache.

Let’s begin with this simple function named Get-RandomDate. It generates and returns a random date that is between today and X days ago. It has an input parameter $DaysAgo, which is of type [System.Int32]–it is a mandatory parameter.

That’s all straightforward, but then things get weird.

Comments closed

Understanding Capitalization Changes in Power BI

Jeroen ter Heerdt explains why you might see the capitalization change in text loaded into Power BI:

Some of you might have seen this before – you load some data into Power BI and suddenly the capitalization (uppercase / lowercase) of your text changes on you. Let me explain what is happening here.

Click through to learn why. Normally I’d say “and what you can do about it,” but there’s not much. I guess the best answer is, embrace consistent capitalization and then you won’t notice anything.

Comments closed

Managing Spark on Kubernetes instead of YARN

Rohit Choudhary argues that it’s a good idea to move from YARN to Kubernetes for your Spark clusters:

When it comes to data operations, Spark provides a tremendous advantage as a resource for data operations because it aligns with the things that make data ops valuable. It is optimized for machine learning and AI, which are used for batch processing (in real-time and at scale), and it is adept at operating within different types of environments.

Spark doesn’t completely manage these clusters of machines but instead uses a cluster manager (known as a scheduler). Most companies have traditionally used the Java Virtual Machine (JVM)-based Hadoop YARN to manage their clusters. But with the dramatic rise of Kubernetes and cloud-native computing, many organizations are moving away from YARN to Kubernetes to manage their Spark clusters. Spark on Kubernetes is even now generally available since the Apache Spark 3.1 release in March 2021.

I see some of the benefits there but am not totally sold, especially given the complexity of Kubernetes and its own lack of built-in security measures.

Comments closed

Ordered Set Functions in SQL Server

I continue a series on window functions in SQL Server:

As of SQL Server 2019, there is only one ordered set function: STRING_AGG(). I like STRING_AGG() a lot, especially because it means my days of needing to explain the STUFF() + FOR XML PATH trick to concatenate values together in SQL Server are numbered.

STRING_AGG() is interesting in that we categorize it as a window function and yet it violates my first rule of window functions: there isn’t an OVER() clause. Instead, it accepts but does not require a WITHIN GROUP() clause. Let’s see it in action.

Click through for a look at that, as well as a little hint that maybe we’ve seen ordered set functions before in a different guise.

Comments closed

FAST_FORWARD and Cursors

Joe Obbish skips past the commercials:

If you’re like me, you started your database journey by defining cursors with the default options. This went on until a senior developer or DBA kindly pointed out that you can get better performance by using the FAST_FORWARD option. Or maybe you were a real go-getter and found Aaron Bertrand’s performance benchmarking blog post on different cursor options. I admit that for many years I didn’t care to know why FAST_FORWARD sometimes made my queries faster. It had “FAST” in the name and that was good enough for me.

Recently I saw a production issue where using the right cursor options led to a 1000X performance improvement. I decided that ten years of ignorance was enough and finally did some research on different cursor options. This post contains a reproduction and discussion of the production issue.

I thought everybody knew how this works: the database streams the data tape from the supply pully to the play shaft by using a sprocket to rotate the gear in the cassette at a fixed speed. The FAST_FORWARD cursor option engages the fast forward idler in the VCR database and causes rotation to occur more rapidly than normal.

Comments closed