Press "Enter" to skip to content

Curated SQL Posts

Sharing Short Code Examples

John McCormack lays out the parameters for this T-SQL Tuesday:

T-SQL Tuesday this month is going back to basics and its all about code. I’d like to know “What are your go to handy short scripts”?

What are those little short bits of code that you can’t live without? I’m talking about little snippets that are only a few lines, that you may even have memorised. It could be T-SQL, PowerShell, Python or anything else you use day to day.

Click through for two of John’s.

Comments closed

UDFs and STRING_AGG

Erik Darling has a bone to pick with STRING_AGG():

If you’re like me and you got excited by the induction of STRING_AGG into the T-SQL Lexicon because of all the code odd-balling it would replace, you were likely also promptly disappointed for a few reasons.

Read on for one post which covers all of those reasons. Even with that disappointment, I’m still happy with STRING_AGG() on the whole, myself. There are some extra steps it’d be nice to eliminate in certain circumstances, but 60% of the time, it works every time.

Comments closed

Apache Flink 1.14.0 Released

Stephan Ewen and Johannes Moser have aa round-up of the latest Apache Flink updates:

The Apache Software Foundation recently released its annual report and Apache Flink once again made it on the list of the top 5 most active projects! This remarkable activity also shows in the new 1.14.0 release. Once again, more than 200 contributors worked on over 1,000 issues. We are proud of how this community is consistently moving the project forward.

This release brings many new features and improvements in areas such as the SQL API, more connector support, checkpointing, and PyFlink. A major area of changes in this release is the integrated streaming & batch experience. We believe that, in practice, unbounded stream processing goes hand-in-hand with bounded- and batch processing tasks, because many use cases require processing historic data from various sources alongside streaming data. Examples are data exploration when developing new applications, bootstrapping state for new applications, training models to be applied in a streaming application, or re-processing data after fixes/upgrades.

Read on for the list of changes.

Comments closed

Emulating Visual Deficiencies with Edge DevTools

Chris Webb shows off a feature in Microsoft Edge:

I’m not an expert on Power BI report accessibility like Meagan Longoria but I do know how important an issue accessibility is. I also know how difficult it can be to remember to check for accessibility issues when building reports which is why, when I was watching this video on new features in Edge DevTools, I was pleased to see that Edge now makes it easy to see how your report looks like when viewed by someone with vision deficiencies.

Click through to see this extension in action.

Comments closed

Working with DACPACs

Chad Callihan has a two-parter. First up is the process of creating a DACPAC:

DAC is the abbreviation for data-tier application and is an item containing the objects of a database. When put together into a versioned file to be used for deploying in SQL Server, that package is called a DACPAC. You can point a DACPAC at an existing database to deploy changes or use a DACPAC to create a brand new database with tables, procedures, and the rest of the database objects built in.

And once you have a DACPAC, you of course need something to do with it:

What if we want to use our DACPAC to create a new database using SSMS? Our first step will be to right click on Databases and select Deploy Data-tier Application.

As a concept, DACPACs are really great: I can package together tables, database code, and reference data in one bundle and import it into a variety of products (e.g., database projects, other SQL Server instances, Azure SQL Database). The problem is that once they get beyond a certain size, DACPACs are rather unwieldy and failure-prone to create and to extract.

Comments closed

Implementing LAG() and LEAD() in Power Query

Nick Edwards wants to perform a lag on data in Power Query:

I’m sure you’ve all heard of windowing functions in SQL Server? But what if you need to replicate these windowing functions in Power Query?

Here I have a sample CarSales.csv dataset of car sales per brand, per year, per month and the number of units sold. I will use this dataset to demonstrate how we can replicate the windowing LAG function in Power Query.

Click through to see the process.

Comments closed

Automating Semantic Versioning with Azure DevOps

Dave Ruijter shows how you can use Azure DevOps to perform automatic semantic versioning:

I am a fan of using semantic versioning (a.k.a. SemVer) for data solutions, following the v1.0.0 pattern. It helps in the communication between team members and stakeholders, by limiting ambiguity and misunderstandings related to the version of your solution’s releases. With semantic versioning, the trick is to increment the version according to the changes you have made since the latest release. Manually keeping track of that is not an easy task, especially for small teams, without the capacity to have somebody dedicated to this administration task. I found a way to make this a lot easier, leaning on the Pull Request description! And as a bonus, we will create some nice release notes automatically

Click through to see what you need to have set up on your Azure DevOps subscription and a detailed walkthrough of how to set it up.

Comments closed

Using the FIRST_VALUE() Window Function

Dave Mason explains what FIRST_VALUE() does:

Last week, I found myself with a T-SQL problem. A customer has a database where multiple “widget” records might exist in a table. End users might enter the duplicate rows, or they might be inserted by other automated processes. My task was to keep the most recent widget record. But, if there were any NULL values, look in the previous “duplicate” rows and use the most recent missing value for each column.

Click through to see it in action. And following up from Dave’s last point, IGNORE NULLS did originate in Azure SQL Edge, though hopefully we see it in the next version of on-premises SQL Server.

Comments closed

Adaptive Query Execution in Spark 3

Amarjeet Singh explains what Adaptive Query Execution is in Apache Spark:

As we all know optimization plays an important role in the success of spark SQL. Therefore, a lot of work has been done in this direction. Before spark 3.0, cost-based optimization was a major hit in which different stages related to cost (based on time efficiency and estimated CPU and I/O usage) are compared and executes the strategy which minimizes the cost. But, because of outdated statistics, it has become a sub-optimal technique. Therefore in spark 3.0, Adaptive Query Execution was introduced which aims to solve this by reoptimizing and adjusts the query plans based on runtime statistics collected during query execution. Thus re-optimization of the execution plan occurs after every stage as each stage gives the best place to do the re-optimization.

Item number 2 from the list is also available in SQL Server, giving you an idea that this is an active battleground for query processing in data platform technologies.

Comments closed

Constraint Programming with R and MiniZinc

Holger von Jouanne-Diedrich solves a classic puzzle:

The following puzzle is a well-known meme in social networks. It is said to have been invented by young Einstein and back in the days I was ambitious enough to solve it by hand (you should try too!).

Yet, even simpler is to use Constraint Programming (CP). An excellent choice for doing that is MiniZinc, a free and open-source constraint modelling language. And the best thing is that you can control it by R! If you want to see how, read on!

I’d solved it once by hand as well, but here we get to see a much easier route. Constraint-based programming is one of those things which doesn’t show up very often in the business world, but I think part of the reason is that most programming languages lack the capacity to implement constraints really well. It could also be that people are usually pretty mushy about laying out proper constraints.

Comments closed