Press "Enter" to skip to content

Curated SQL Posts

Determining the Right Batch Size for Deletes

Jess Pomfret breaks out the lab coat and safety goggles:

I found myself needing to clear out a large amount of data from a table this week as part of a clean up job.  In order to avoid the transaction log catching fire from a long running, massive delete, I wrote the following T-SQL to chunk through the rows that needed to be deleted in batches. The question is though, what’s the optimal batch size?

I usually go with a rule of thumb: 1K for wide tables (in terms of columns and row size) or when there are foreign key constraints, 10K for medium-width tables, and about 25K for narrow tables. But if this is an operation you run frequently, it’s worth experimenting a bit.

Comments closed

Building a Pipeline for External Data Sharing

Hope Foley has data to share:

I worked with a customer recently who had a need to share CSVs for an auditing situation.  They had a lot of external customers that they needed to collect CSVs from for the audit process.  There were a lot of discussions happening on how to best do it, whether we’d pull data from their environment or have them push them into theirs.  Folks weren’t sure on that so I tried to come up with something that would work for both. 

Read on for Hope’s solution to the problem.

Comments closed

Tracking SQL Server Uptime

Garry Bargsley has a cmdlet for us:

This week’s blog post will help you check your SQL Servers up-time. There are numerous reasons I can think of that you would want to know how long your SQL Server has been online. Was the server recently patched, did it crash and come back online, or did someone restart it by mistake? These are all valid questions about a single SQL Server or your entire estate. I will show you how you can easily check one too many servers quickly for uptime.

We will start by using every DBA’s favorite PowerShell module…  dbatools

Admittedly, I’d just check the start time for the tempdb database, but this cmdlet does give more info.

Comments closed

Reasons for Partitioning in SQL Server

Erik Darling has opinions:

When I work with clients, nearly every single one has this burning question about partitioning.

“We’ve got this huge table, should we partition it?”

“Do you need to insert or delete data in big chunks?”

“No, it’s all transactional.”

“Do you have last page contention problems?”

“No, but won’t it help performance?”

“No, not unless you’re using clustered column store.”

“…”

Read on to unpack Erik’s argument. I do wish that there were more good cases for partitioning in SQL Server, but they’re almost all in the analytics space—which is part of why partitioning is a lot more useful in Azure Synapse Analytics dedicated SQL pools.

Comments closed

Testing IOPS, Latency, and Throughput: an Analogy

Brent Ozar has a timely analogy for us:

You’re trying to decide whether to use DHL, FedEx, UPS, or your local postal service.

You could measure them by sending me a gift – after all, it is the holidays, and I do a lot of stuff for you year round, and it’s probably the least you could do.

– You place one box outside

– You call the shipping company to come pick it up, and start the clock

– When I get it, I’ll call you to confirm receipt, and you stop the clock

Click through for the rest of the story.

Comments closed

Using Scala at Databricks

Li Haoyi gives us a peek behind the curtain:

With hundreds of developers and millions of lines of code, Databricks is one of the largest Scala shops around. This post will be a broad tour of Scala at Databricks, from its inception to usage, style, tooling and challenges. We will cover topics ranging from cloud infrastructure and bespoke language tooling to the human processes around managing our large Scala codebase. From this post, you’ll learn about everything big and small that goes into making Scala at Databricks work, a useful case study for anyone supporting the use of Scala in a growing organization.

It’s always interesting to see how the largest companies handle certain classes of problems. From this post, we can get an idea of the high-level requirements and usage, making it worth the read.

Comments closed

What Good Data Governance Means

Paul Andrew shares some thoughts on the true meaning of data governance:

… Someone asked me what I thought ‘good’ Data Governance might look like for a given data platform – warehouse or analytics solution. This included all aspects of data governance is was a very broad question.

To add some actual context, not related to Star Wars, this was during the time when Microsoft started talking about version 2 of the Azure Data Catalogue offering and what else could/should be included in a suite of governance tools. Certainly, long before the days of what we now call Azure Purview. It was also a time when GDPR had a lot of focus for the data community as we battled with the principals and technical implications.

Anyway, with a recent fresh perspective on things, I’ve decided to dust off my original notes and attempt to distill the answer of good data governanace into the following areas. What I also find with data governanace that the more I experience in the industry the deeper my perspective on the subject goes.

Paul has put together a lengthy answer on the topic, well worth the read.

Comments closed

Batch Mode and Window Functions

I wind down a series on window functions:

SQL Server typically operates in row mode, which means that an operator processes one row at a time. This sounds inefficient, but tends to work out pretty well in practice. However, something which may work out even better is to process more than one row at a time, especially when the number of rows gets to be fairly large. Enter batch mode.

Batch mode was introduced in SQL Server 2012 alongside non-clustered columnstore indexes. It became interesting in SQL Server 2016 and very interesting in SQL Server 2019. That’s because 2016 introduced writable clustered columnstore indexes and 2019 gives us batch mode outside of columnstore indexes.

There are some nice potential performance gains for queries involving window functions.

Comments closed

tempdb Usage and Recommendations

Chad Callihan has a two-parter for us. First up is a look at tempdb usage:

The tempdb database in SQL Server holds a little bit of everything over its existence. There can be the temporary tables that you created, work tables like spools that SQL Server creates, version store data, and much more in tempdb at any time.

Is your tempdb having a hard time keeping up with the workload on your SQL Server? How can you find out what is using tempdb the most?

Chad also recommends picking the right number of files for tempdb:

We just made it through Thanksgiving dinner and there’s another big Christmas dinner coming up in few weeks. Sometimes you might find that one plate is not enough to hold all of that food. Just as multiple plates can make big dinners easier to manage, multiple tempdb data files can make the SQL Server workload easier to manage.

Read on to see how to track and modify this setting.

Comments closed