Press "Enter" to skip to content

Curated SQL Posts

Installing or Upgrading SQL Server on a Workstation

Aaron Bertrand shares some advice:

Lately, I’ve seen many people struggling to upgrade their workstation to the latest version of SQL Server. The main source of the problem is usually the web installer/wrapper, which can fail for a variety of reasons that aren’t always made clear by the error messages. Today, I’m going to walk through one way you can approach a workstation upgrade and avoid these web installer issues altogether. I’ll focus on SQL Server 2022, but most of the information applies to any modern major version. Most of the information applies if you’re installing a brand-new instance, too.

Click through for several good tips and practices.

Comments closed

dataConvergeDefinition and DirectQuery Partitions

Chris Webb talks about hybrid tables:

Hybrid tables – tables which contain both Import mode and DirectQuery mode partitions to hold data from different time periods – have been around for a while. They are useful in cases where your historic data doesn’t change but your most recent data changes very frequently and you need to reflect those changes in your reports; you can also have “reverse hybrid tables” where the latest data is in Import mode but your historic data (which may not be queried often but still needs to be available) is in DirectQuery mode. Up to now they had a problem though: even when you were querying data that was in the Import mode partition, Power BI still sent a SQL query to the DirectQuery partition and that could hurt performance. That problem is now solved with the new dataCoverageDefinition property on the DirectQuery partition.

Read on to see what dataCoverageDefinition does.

Comments closed

Improving Aurora Postgres Performance by Lowering random_page_cost

Shayon Mukherjee does some performance tuning:

Recently I have been working with some queries in Postgres where I noticed either it has decided not to use an index and perform a sequential scan, or it decided to use an alternative index over a composite partial index. This was quite puzzling, especially when you know there are indexes in the system that can perform these queries faster.

So what gives?

After some research, I stumbled upon random_page_cost (ref).

Read on to learn more about what this is, how you can change it, and a warning before you go wild in changing it.

Comments closed

Accessibility in UI Design

Benedict Ampea-Badu shares some tips:

In this last segment of the series, we’ll delve into the crucial elements of navigation and flow, exploring further the part of web accessibility that ensures users can effortlessly navigate online content. Additionally, we’ll shine a spotlight on the indispensable practice of regular testing, an ongoing commitment that goes beyond compliance, providing invaluable insights into real user experiences and fostering continuous improvement.

Ensuring a seamless online experience for everyone and facilitating smooth navigation boils down to adhering to the fundamentals of web accessibility. It boils down to carefully planning how websites are laid out, making it easy for users to explore different things effortlessly. When developers grasp how users interact and tackle possible issues, they go beyond just ticking boxes on a checklist. Instead, they create websites that truly put users first, considering how people actually use and prefer to browse.

This is particularly about accessibility in web applications but many of the same principles apply to things like Power BI reports.

Comments closed

Using Databricks System Tables

Dustin Vannoy has a primer on system tables in Databricks:

Monitoring is important, so I’ve covered the topic a few times in the past. I’ve talked about collecting your Spark application logs and Spark metrics. These are a good way to track what is happening and what is going wrong as your code runs. In the video related to this post I focus on a different side of monitoring. The evolving capabilities offered by Databricks System Tables. I have some sample queries and links to help you get started and begin to get value from system tables. This will need to be updated (I’ll try) as new tables go into public preview status. So let’s discuss the questions I had when I first started researching this feature:
1) What do the Databricks system tables offer me for monitoring?
2) How much does this overlap with the application logs and metrics?

Click through for a video and a walkthrough.

Comments closed

Looping in Python

Jack Wallen has us spinning in circles:

What is a loop? Other than a fruity breakfast cereal, a loop is a portion of code that repeats a set number of times until a desired process is complete.

Here’s an example of a loop that many can relate to. You go to the ATM, insert your card, and start with your transaction. First, you check your balance, then you remove money from your account. The loop keeps running with each transaction until you tell the ATM you are finished.

Read on for loop examples, though for is definitely much more common and popular than while. I’d also recommend learning a bit about list comprehensions, which are a way of turning a for loop into a one-line operation: [f(x) for x in my_list] is a simple example: for each row in my_list, perform function f() with that row’s value as the input. Then, take all of the results of f(x) and stick them together in a new list.

Comments closed

Supporting Replication in dbatools

Jess Pomfret makes an announcement:

Replication, everyone loves to hate it, but it’s been our most requested feature as far as adding commands to dbatools. For ages we’ve said ‘sounds great’ and ‘we would love that’, but when we started looking into it the energy soon fizzled away, due to it’s dependency on RMO – Replication Management Objects, as opposed to SMO – SQL Server Management Objects, things weren’t as easy as we had hoped.

That said, it’s now there as of version 2.1.1. It supports snapshot, transactional, and merge replication. No peer-to-peer, but is anyone actually surprised at that?

Comments closed

Search Patterns in T-SQL

Erik Darling puts on the fedora and grabs the bullwhip:

First, what you should not do: A universal search string:

The problem here is somewhat obvious if you’ve been hanging around SQL Server long enough. Double wildcard searches, searching with a string type against numbers and dates, strung-together OR predicates that the optimizer will hate you for.

These aren’t problems that other things will solve either. For example, using CHARINDEX or PATINDEX isn’t a better pattern for double wildcard LIKE searching, and different takes on how you handle parameters being NULL don’t buy you much.

Read on for an example of a terrible search query, a mediocre search query, a good search query, and a possible unicorn: an actually valid reason to use a non-clustered columnstore index.

Comments closed

Clustered Index Rebuilds and Transaction Log Space

Chad Callihan talks about a potential problem:

Of all the problems that can cause a database transaction log to fill up, perhaps one lesser-known cause is a large clustered index rebuild. Let’s look at a scenario where this issue could cause a headache.

Click through for the scenario. I will say that I disagree with reorganizing the index under pretty much any circumstances. Jeff Moden, in a long and deep but extremely interesting presentation he gave to us at TriPASS three years ago, laid out why that is. I do agree that rebuilding a clustered index should be uncommon and may be a sign of a data modeling problem.

The other thing to note is that this problem is bigger when you rebuild online and your data drive is the same as your log drive. When you rebuild an index online, SQL Server needs enough space to make a copy of the data, so the rule of thumb is that you’ll have 2x the normal disk space requirements during online rebuild. It’s not necessarily exactly 2x because of fragmentation, compression, and other factors, but that’s why it’s a rule of thumb and not a “guarantee or your money back” type of thing.

Comments closed

Postgres Backup and Recovery with Barman

Muhammad Ali shows off another way to back up and restore data in Postgres databases:

Barman is a production grade tool for managing the backup and recovery process of PostgreSQL databases. It not only handles the physical backups but also provides automatic management of retention policies, compression, near zero Recovery point objective(RPO) and enables recovery at any desired point(PITR) in time within the recovery window.

Barman’s Stream Archiving feature stands out as a key component in achieving zero Recovery Point Objective (RPO). This is achieved by using pg_recievewal utility which continuously backs up Write-Ahead Logging (WAL) files in real-time to a designated Barman server. This capability is particularly important for applications where even minimal data loss is unacceptable.

Read on for a couple questions about it and a demonstration of how Barman works.

Comments closed