Press "Enter" to skip to content

Curated SQL Posts

Building a Better .gitconfig

Colin Gillespie digs in:

Getting started with Git is easy (ha!), but once you’ve mastered the basics, it’s natural for developers to start thinking about customising their git process. Most Git settings live in the .gitconfig file. In this blog post, I’ll discuss what you should consider setting in your config file to make a more efficient development environment.

There are some interesting settings that I hadn’t heard of, but I could see making sense.

Leave a Comment

More Spark Jobs, Fewer Notebooks

Miles Cole lays out an argument:

I’m guilty. I’ve peddled the #NotebookEverything tagline more than a few times.

To be fair, notebooks are an amazing entry point to coding, documentation, and exploration. But this post is dedicated to convincing you that notebooks are not, in fact, everything, and that many production Spark workloads would be better executed as a non-interactive Spark Job.

Miles has a “controversial claim” at the end that I don’t think is particularly controversial at all. I agree with pretty much the entire article, especially around the difficulties of testing notebooks properly.

Leave a Comment

SSMS Updates and Code Completions

Brent Ozar wants an update:

A long time ago in a galaxy far, far away, SQL Server Management Studio was included as part of the SQL Server installer.

Back then, upgrading SSMS was not only a technical problem, but a political one too. Organizations would say things like, “Sorry, we haven’t certified that cool new SQL Server 1982 here yet, so you can’t have access to the installer.” Developers and DBAs were forced to run SSMS from whatever ancient legacy version of SQL Server that their company had certified.

Working in a controlled industry, I still get to hear that answer.

Leave a Comment

Offline Installation of SSMS 22

Nivritti Suste grabs the bits:

Beginning with SQL Server Management Studio 21, Microsoft stopped providing the direct download package/binaries to install SSMS, instead it just downloads the SSMS installer. This installer then starts the installation of SSMS and downloads what is needed to install SSMS. Sometimes there may be a need to do an offline installation where you do not have access to the internet. In this article, we walk through the steps to do an offline install of SSMS.

I would have been curious about how large the installation folder is, considering that it grabs all possible options.

Leave a Comment

Trying the Regex-Based Replace Function in Excel

Ben Richardson checks out a new function:

Instead of building up a few different text functions on top of each other, you can now use regex inside Excel formulas to search for patterns, and clean data much more efficiently.

Our favourite of these new additions is REGEXREPLACE, which lets you find text based on patterns and replace data in one simple formula.

Read on to see how the REGEXREPLACE() function works.

Leave a Comment

Dealing with NULL and Empty String in Multiple RDBMS Platforms

Akhil Reddy Banappagari compares three popular platforms:

When you are planning database migrations to PostgreSQL, it is usually the small things that cause the biggest production bugs. One of the most common traps for developers is how different databases handle NULL and empty strings ('').

While they might seem like similar concepts, representing the absence of a value, the way a database engine interprets them can change your query results, break your unique constraints, or cause data loads to fail. In this guide, we will compare the behavior of OracleSQL Server, and PostgreSQL to help you avoid common migration pitfalls.

PostgreSQL and SQL Server are close in the way in which they deal with NULL and empty strings, but all three platforms have at least some differentiation, so if you’re deeply familiar with one, the next platform may trip you up a little.

Leave a Comment

Implementing the OPTICS Clustering Algorithm in SQL Server

Sebastiao Pereira implements an algorithm:

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters very similar do DBSCAN. However, OPTICS handles it more effectively in the case of a cluster with varying densities gaining deeper insights exploring the hierarchical structure of your data. This algorithm is generally more computationally intensive.

Is it possible to have the OPTICS clustering algorithm implemented in SQL Server without using an external solution?

Click through for that implementation.

Leave a Comment

The Basics of Transaction Logging

Paul Randal republishes an older post that is no longer available:

All through my career as a data professional, both inside Microsoft and as a consultant, I’ve found that one of the most misunderstood parts of SQL Server is the transaction log. Lack of knowledge of how the transaction log works and needs to be managed, or just simple misconceptions, can lead to all kinds of production problems, including:

  • The transaction log growing out-of-control and potentially running out of space
  • Performance issues from repeated shrinking of the transaction log
  • Performance issues from a problem known as VLF fragmentation, which I discussed in this post
  • The inability to recover to a desired point in time using transaction log backups
  • The inability to perform a tail-log backup during disaster recovery (see here for an explanation of tail-log backups)
  • Various issues around failovers and restore performance

With this post I’m starting an occasional series (now here on my SQLskills blog) on the transaction log and how it works and should be managed, and I’ll touch on all the problems above over its course. In this post I’ll explain what logging is and why it’s required.

Read on for that explanation.

Leave a Comment

NOWAIT Hints and Annoyances with Query Store Hints and Plan Guides

Erik Darling performs a rather late Airing of Grievances:

In this video, I delve into some of the frustrations and annoyances associated with query store hints and plan guides in SQL Server. I explore how these tools can sometimes hinder rather than help, particularly when trying to override certain behaviors or improve performance. For instance, I demonstrate the quirks of using a `NO_WAIT` hint in a transactional context and highlight why Query Store’s inability to support table hints is such a significant limitation. Additionally, I discuss the cumbersome nature of plan guides, especially their requirement for maintaining semantic affecting hints that might be detrimental to query performance. These issues underscore the need for more robust and flexible tools within SQL Server to better meet the diverse needs of database administrators and developers.

Click through for the video.

Leave a Comment

The Challenge of Many-to-Many Relationships in Power BI

Ben Richardson explains a common anti-pattern in Power BI semantic models:

Relationships sit at the heart of literally everything you do in Power BI.

Before you make measures, visuals and reports, relationships are established to define how your data fits together. Their job is simple on the surface – but vital: describe how each table is connected.

If you can design these relationships well, everything else will run much smoother.

Across any data domain, strong models rely on clear Grain, correct Cardinality, and a Star Schema built with well-defined Fact and Dimension tables.

Read on to understand how many-to-many relationships stress this understanding in Power BI an different techniques for dealing with those sorts of relationships.

Leave a Comment