Diagnosing a Partition Job Failure after Migration to an AG

Mike Lynn describes a customer issue:

Quick Summary

A client noticed one of their reporting tables wasn’t logging any new information after the first of the new month.

Context

This environment ran on SQL Server 2019 in an Always On Availability Group configuration hosted on AWS EC2 servers. This is roughly 30-45 days after the servers were migrated from a SQL Server Failover Cluster Instance in AWS on EC2 to the new AG setup.

Read on for the problem, the discovery process, and the solution. I like reading this sort of report specifically to focus on the process. One of the best skills you can develop in any technical field is the practice of methodical behavior: review and understand the error message (perhaps with the assistance of a search engine or tool of choice), then work logically through possible issues until you discover the cause. It sounds obvious when I describe it that way, but far too often, people flail about and try a variety of arbitrary things because they don’t really understand the issue and hope that doing this one thing will fix whatever problem is happening.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31