Press "Enter" to skip to content

Curated SQL Posts

Find and Replace Text Strings in SQL

Ed Pollack deals with the lack of regular expressions in T-SQL:

This article is focused on how to write, simplify, and get the most out of string data using the least amount of code to do so. It’s a reminder that code complexity is equivalent to technical debt and that in these scenarios, less is always more.

Text-manipulating functions will be introduced and reviewed throughout this article, so feel free to perform searches against it to find what you are looking for.

Read on for mechanisms to split strings into multiple rows, build strings from multiple rows, and search+replace within strings.

Comments closed

An Overview of DAX Visual Calculations

Teo Lachev takes us through visual calculations in DAX:

Suppose you need a measure that calculates the difference between the product categories in the order they were sorted in the visual. Implementing this as a regular DAX measure is a challenge. Yet, if we had a way to work with the cells in the visual, we can easily find a way to get this to work. Ideally, this would work similar in Excel, but DAX doesn’t know a think about cell references. However, visual calculations do.

As always, Teo gives us a view of the good, the bad, and the ugly with this feature.

Comments closed

Finding the First of Many Similar Rows in SQL

Tobias Petry shows off a feature in Postgres:

SQL is a straightforward and expressive language, but it sometimes lacks constructs for writing queries in a simple way. It’s more complicated than it should be to write a query to get, e.g., only the most expensive order for every customer of the current fiscal year. You can either use PostgreSQL’s vendor-specific DISTINCT ON feature or window functions for every other database.

Click through for the normal method (which works for MySQL, Oracle, SQL Server, etc.) and the Postgres way.

Comments closed

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Comments closed

Feature Request: Right-Click to Open Plan in New Tab

Erik Darling wants you to vote:

One problem I run into regularly is when I’m tuning a query, and I want to keep one of the resulting execution plans available, so I can:

  1. Compare it after I make some other changes and run the query again
  2. Do some additional analysis without worrying about accidentally losing the plan

The only way to do that is to save the plan, stick the XML in another tool, or keep opening new query tabs to run things in, where I won’t lose the plan.

Vote for Erik’s idea.

Comments closed

Azure Data Studio 1.48 Now Available

David Levy gives us the latest:

The February release of Azure Data Studio (version 1.48) includes several new features and enhancements, such as:

  • Support for restores from URLs, including S3-compatible storage, via the restore dialog
  • Query Editor now includes SPIDs in tab titles
  • You can now specify a connection when launching ADS from the command line
  • You can now enable Ledger when creating a database in the Create Database dialog

Read on for more about each of these topics.

Comments closed

Melting Datasets in R

Steven Sanderson performs a melt():

The melt() function in the data.table package is an extremely useful tool for reshaping datasets in R. However, for beginners, understanding how to use melt() can be tricky. In this post, I’ll walk through several examples to demonstrate how to use melt() to move from wide to long data formats.

“Melting,” by the way, is the R term for unpivoting data.

Comments closed

Generators in Python

Jack Wallen generates some values:

Within the realm of programming, a generator is a routine that is used to control interaction within a loop. Generators are useful when you want to produce a large sequence of values without storing them in memory at once. The ability to create a large sequence without using up memory is important, especially when dealing with Python programs that generate a large amount of information, such as a long sequence of numbers.

Read on to see how generators work and how to build a custom generator.

Comments closed

Apache Kafka 3.7 Released

Stanislav Kozlovski makes an announcement:

We are proud to announce the release of Apache Kafka® 3.7.0. This release contains many new features and improvements. This blog post will highlight some of the more prominent features. For a full list of changes, be sure to check the release notes.

See the Upgrading to 3.7.0 from any version 0.8.x through 3.6.x section in the documentation for the list of notable changes and detailed upgrade steps.

Read on to see what’s new. Looks like they’ve taken care of a couple dozen items in this release, so plenty to read there.

Comments closed

Classical Methods for Outlier Detection

Vinod Chugani is speaking my language:

Outliers are unique in that they often don’t play by the rules. These data points, which significantly differ from the rest, can skew your analyses and make your predictive models less accurate. Although detecting outliers is critical, there is no universally agreed-upon method for doing so. While some advanced techniques like machine learning offer solutions, in this post, we will focus on the foundational Data Science methods that have been in use for decades.

Vinod looks at a few techniques, including inter-quartile range and comparing results to an expected distribution. If you’re really excited about this topic, I know a guy who’s written a bit about it.

Comments closed