Press "Enter" to skip to content

Curated SQL Posts

A Mechanism for Change Management for DBAs

Terri Hurley shares one path:

Back in the day when I received an email from a developer who requested a Production Change, I would copy and paste SQL code from an email to SQL Server to run against a production database. I never bothered to save that code. But later when I was asked for a history of database changes, it was awkward to locate all the previous email requests asking for production changes. Fast forward to present day, most businesses have embraced Change Management Separation of Duties in companies of all sizes.

My preferred path to change management is:

  • All changes go into source control. You can either have the people making changes create the actual scripts to run, or generate them from base changes in source control, dealer’s choice.
  • Each changeset goes to a specific environment when it gets PR’d into the appropriate branch—that is, dev, QA, test, pre-prod, prod, post-prod, next-gen-post-dev-unprod, whatever.
  • DBAs are on pull requests, at least after a certain point. PR to dev? Let the team handle that code review. Once you move nearer to prod, at least one DBA performs a code review.
  • After PR gating happens, your CI/CD system automatically deploys the changes, assuming it passes any continuous integration tests.

This takes time to get right, but it’s very much worth the effort. I worked at a company where they built the release management processes on top of Git + Azure DevOps. You can also use tools like Flyway.

Comments closed

Contrasting Power BI and Power BI Report Server

Soheil Bakhshi makes me sad:

Depending on your organisation’s preferences, data governance requirements, and the platforms you intend to use for report deployment you may use either Power BI Desktop, the “standard version”, or Power BI Desktop RS (Report Server). Power BI Desktop has variations tailored to meet specific needs, such as cloud-based analytics or on-premises reporting. While many users might only encounter the standard version, there’s another important variant for specialised scenarios.

You know the meme where the kid asks, “Mom, I want Power BI!” and mom says, “We have Power BI at home.” Power BI Report Server is Power BI at home, in all of the senses of the joke. I’ve been stuck with it at two employers and although it is better than nothing, you quickly find out how annoying it gets when that thing you know how to do in Power BI Online doesn’t work because they never supported it in PBIRS.

Comments closed

Power BI Writeback via Fabric SQL Database

Jon Voge gives us a use case for Fabric SQL Databases:

Until recently, Fabric has allowed us to choose between Lakehouses and Warehouses as a backend. For write-back use cases, neither are ideal.

  • The SQL Endpoint of Lakehouses are Read-Only, making writes from Power Apps impossible.
  • While the SQL Endpoint of Warehouses are write-enabled, they do not support enforced Primary Keys, which are a hard requirement for Power Apps to be able to write directly to a data source.

Jon briefly describes two mechanisms people used and then how you can do this more effectively with a Fabric SQL Database. Based on the article, it seems that you could probably still do the same with an Azure SQL Database, though I suppose handling the managed identity could be an issue.

Comments closed

Data Visualization in Matplotlib

Rajendra Gupta generates some graphics:

Data analysis requires analysts to handle structured, semi-structured, or unstructured data. Small datasets with few rows and columns are easy to understand. However, as the data complexity increases with many interlinked variables, getting data insights from tabular formatted data becomes challenging. According to a recent study from MIT, the human brain processes an entire image in just 13 milliseconds. Therefore, it is helpful to learn Python and visualization together.

How do we use Python to generate plots from the data to analyze patterns, correlations, and trends? What plots are available, and how do we use them with customizations? Let’s explore them in this tip.

There are a few visualization libraries in Python I prefer over matplotlib, and for static graphics, ggplot2 in R has pretty much everything else beat. But matplotlib is essentially the standard, so it’s important to know.

Comments closed

GROUP BY Clause Order and Performance in PostgreSQL

Andrei Lepikhov runs some tests:

PostgreSQL users often employ analytical queries that sort and group data by different rules. Optimising these operators can significantly reduce the time and cost of query execution. In this post, I will discuss one such optimisation: choosing the order of columns in the GROUP BY expression.

Postgres can already reshuffle the list of grouped expressions according to the ORDER BY condition to eliminate additional sorting and save computing resources. We went further and implemented an additional strategy of group-by-clause list permutation in a series of patches (the first attempt and the second one) for discussion with the Postgres community, expecting it to be included in the next version of PostgreSQL core. You can also try it in action in the commercial Postgres Pro Enterprise fork.

From a “this is a 4th generation language” perspective, there should never be a performance difference in the ordering of grouped columns in a GROUP BY clause. The database optimizer should always determine the optimal ordering. Read Andrei’s article to see how much of a challenge that can be in practice.

Comments closed

Building a Gantt Chart

Amy Esselman wants to know if you’re on time or have any slack:

A Gantt chart is used to visually present tasks or events over time. It consists of a horizontal axis of time and horizontal bars representing the duration of each item arranged along the vertical axis.

The visual was popularized in the early 1900s by American engineer and management consultant Henry Gantt, for whom the chart is named. Most commonly, this view is used to see the start date and duration of each task in a larger project to better understand timelines, milestones, and dependencies.

Click through for an overview of the chart, where it can be useful, and some of the limitations of the chart.

Comments closed

An Overview of the Naive Bayes Class of Algorithms

Harris Amjad takes us through a rather useful class of algorithms for classification:

As AI and Machine Learning have increased in popularity, especially Large Language Models, more professionals have explored how these systems work. Unfortunately, some put the cart before the horse, where they take on more complex algorithms before learning to pave the foundation, resulting in faded interest in the topic. This tip will introduce a simple probabilistic, yet powerful classifier, the Naïve Bayes Model, and implement it in Python.

I like using the Naive Bayes variants, despite the fact that it is not Bayesian and arguably isn’t very naive. The reason I like to use this class of algorithm is that it’s fast, easy, and gives you a useful baseline for quality. If you need to meet some specific quality threshold (say, accuracy > 85% or F1-score above 0.8), you can get an answer quickly with Naive Bayes. If that answer is anywhere near your threshold, the problem is likely solvable. If your answer is way below the threshold, it’s probably not worth spending the time or compute effort trying out a variety of other algorithms.

Comments closed

Analyzing Semantic Model Logs via Microsoft Fabric

Sandeep Pawar parses the logs:

Workspace Monitoring was one of my favorite announcements at MS Ignite ‘24 this week. It logs events from Fabric items such as Semantic Models, Eventhouse, GraphQL to a KQL database that’s automatically provisioned and managed in that workspace. Currently it’s limited to these three items but hopefully other (especially spark and pipelines) will be added soon. Read the announcement by Varun Jain (PM, Microsoft) on this for details. 

Click through for some thoughts from Sandeep, as well as a variety of useful queries.

Comments closed

Generating Effect Plots in Python and R

MIchael Mayer builds some effect plots:

The plots show different types of feature effects relevant in modeling:

  • Average observed: Descriptive effect (also interesting without model).
  • Average predicted: Combined effect of all features. Also called “M Plot” (Apley 2020).
  • Partial dependence: Effect of one feature, keeping other feature values constant (Friedman 2001).
  • Number of observations or sum of case weights: Feature value distribution.
  • R only: Accumulated local effects, an alternative to partial dependence (Apley 2020).

Click through to see how they both work.

Comments closed