I used to spend a lot of time doing patching, and I had plenty of times when:
Servers wouldn’t come back up after a reboot. Someone had to go into the iLo/Rib card and give them a firm shove
Shutdown took forever. SQL Server can be super slow to shut down! I understand this better after reading a recent post on the “SQL Server According to Bob” blog. Bob Dorr explains that when SQL Server shuts down, it waits for all administrator (sa) level commands to complete. So, if you’ve got any servers where jobs or applications are running as sa, well…. hope they finish up fast.
Patching accidentally interrupted something important. Some process was running from an app server, etc, that failed because patching rebooted the server, and it fired off alarms that had to be cleaned up.
Something failed during startup after reboot. A service started and failed, or a database wasn’t online. (Figuring out “was that database offline before we started?” was the first step. Ugh.)
Miscommunication caused a problem on a cluster. Whoops, you were working on node2 while I was working on node1? BAD TIMES.
This is a really good post. Kendra’s done a lot more patching than I have, and she’s definitely though about it in more detail. Me, I’m waiting for the day—which is very close for some companies—in which you don’t patch servers. Instead, you spin up and down virtual apps and virtual servers which are fully patched. It’s a lot harder to do with databases compared to app servers, but if you separate data from compute, your compute centers are interchangeable. When a new OS patch comes out, you spin up new machines which have this patch installed, they take over for the old ones, and after a safe period, you can delete the old versions forever. If there’s a failure, shut down the new version, spin back up the old version, and you’re back alive.