Press "Enter" to skip to content

Month: October 2021

Automating Semantic Versioning with Azure DevOps

Dave Ruijter shows how you can use Azure DevOps to perform automatic semantic versioning:

I am a fan of using semantic versioning (a.k.a. SemVer) for data solutions, following the v1.0.0 pattern. It helps in the communication between team members and stakeholders, by limiting ambiguity and misunderstandings related to the version of your solution’s releases. With semantic versioning, the trick is to increment the version according to the changes you have made since the latest release. Manually keeping track of that is not an easy task, especially for small teams, without the capacity to have somebody dedicated to this administration task. I found a way to make this a lot easier, leaning on the Pull Request description! And as a bonus, we will create some nice release notes automatically

Click through to see what you need to have set up on your Azure DevOps subscription and a detailed walkthrough of how to set it up.

Comments closed

Using the FIRST_VALUE() Window Function

Dave Mason explains what FIRST_VALUE() does:

Last week, I found myself with a T-SQL problem. A customer has a database where multiple “widget” records might exist in a table. End users might enter the duplicate rows, or they might be inserted by other automated processes. My task was to keep the most recent widget record. But, if there were any NULL values, look in the previous “duplicate” rows and use the most recent missing value for each column.

Click through to see it in action. And following up from Dave’s last point, IGNORE NULLS did originate in Azure SQL Edge, though hopefully we see it in the next version of on-premises SQL Server.

Comments closed

Adaptive Query Execution in Spark 3

Amarjeet Singh explains what Adaptive Query Execution is in Apache Spark:

As we all know optimization plays an important role in the success of spark SQL. Therefore, a lot of work has been done in this direction. Before spark 3.0, cost-based optimization was a major hit in which different stages related to cost (based on time efficiency and estimated CPU and I/O usage) are compared and executes the strategy which minimizes the cost. But, because of outdated statistics, it has become a sub-optimal technique. Therefore in spark 3.0, Adaptive Query Execution was introduced which aims to solve this by reoptimizing and adjusts the query plans based on runtime statistics collected during query execution. Thus re-optimization of the execution plan occurs after every stage as each stage gives the best place to do the re-optimization.

Item number 2 from the list is also available in SQL Server, giving you an idea that this is an active battleground for query processing in data platform technologies.

Comments closed

Constraint Programming with R and MiniZinc

Holger von Jouanne-Diedrich solves a classic puzzle:

The following puzzle is a well-known meme in social networks. It is said to have been invented by young Einstein and back in the days I was ambitious enough to solve it by hand (you should try too!).

Yet, even simpler is to use Constraint Programming (CP). An excellent choice for doing that is MiniZinc, a free and open-source constraint modelling language. And the best thing is that you can control it by R! If you want to see how, read on!

I’d solved it once by hand as well, but here we get to see a much easier route. Constraint-based programming is one of those things which doesn’t show up very often in the business world, but I think part of the reason is that most programming languages lack the capacity to implement constraints really well. It could also be that people are usually pretty mushy about laying out proper constraints.

Comments closed

Understanding Logistic Regression

Luis Valencia explains the idea of logistic regression:

Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.

However, unlike ordinary linear regression, in it’s most basic form logistic regressions target value is a binary variable instead of a continuous value.

Read on to learn more about logistic regression. The point I like to make about logistic regression is that people brand new to it say it’s regression, because hey, it has “regression” in its name! People who are more familiar with it say that’s a misnomer and it’s really a classification algorithm, not a regression algorithm. But as Luis shows, people who are very familiar with it understand that it is a regression algorithm, which just happens to have nice classification properties because in many cases, elements get pushed to the edges (0 and 1).

Comments closed

Get-FileAgeGroup

Jeffrey Hicks builds a useful function:

If there’s one task I’ve never stopped doing, it is finding files. I am constantly creating new ways to organize files and display them in a meaningful format. Naturally, PowerShell is a great tool for this task. Get-ChildItem is obviously the proper starting point. The cmdlet works fine in getting only files from a folder and I can do basic early filtering by name and extension with wildcards. But my latest task was organizing files by date, and because of the way Get-ChildItem works under-the-hood, I’m going to need to resort to late filtering with Where-Object. There’s nothing wrong with that. But if this is a task I’m likely to repeat, then a PowerShell function is on the drawing board. My goal is to create a function that will display files grouped into aging buckets such as 1-week or 6-months. Even though I’m primarily concerned with age based on the last write time, I (or you) might have a need to base aging on the creation time. Let’s code.

Click through for a great walkthrough and code.

Comments closed

Verbose Powershell Results

Robert Cain gets verbose on us:

In my previous post, Fun With the PowerShell Switch Parameter, I introduced the use of switches with PowerShell functions. We learned how they work, and how to create your own. For all functions, PowerShell creates a set of switches that are just “built in” to PowerShell. You do not have to explicitly create these yourself, you can simply use them and implement them within your own functions.

Two of the most used are -Verbose and -Debug. When used with the Write-Verbose and Write-Debug cmdlets they will display informational messages to the user of your functions. In this post, we’ll focus on the Verbose switch. The next post in the series will examine the Debug switch.

Click through for examples of how this can be useful.

Comments closed

Against Reporting Tables

Erik Darling doesn’t like reporting tables:

I’ve seen a lot of crazy ways for people to run reports on live data, but one of the worst ideas is creating a reporting table based on the set of data a user wants, and then letting them query that data.

As usual, Erik says something I want to disagree with, and then I read the post and don’t really disagree with him—or if I do, he’s already laid out the “Yes, I understand X” exception. I’ve used reporting tables to good effect, but the important thing is that they’re general-purpose and designed into the application, not specific to a single user.

Comments closed

Recommendations for SQL Server on VMware

Anthony Nocentino has some recommendations for us:

The intent of this post is a quick reference guide based on the recommendation made in “Architecting Microsoft SQL Server on VMware vSphere” April 2019 version. The target audience for this blog post is for SQL Server DBAs introducing them to the most impactful configurations and settings for running SQL Server in VMware.

For the explanations for each of these settings and how to configure the base VMware infrastructure, please read the “Architecting Microsoft SQL Server on VMware vSphere” guide and consult with your VMware administrators and experts.

Click through for Anthony’s summary.

Comments closed