Sheldon Hull has an essay on site reliability engineering in practice:
I’ve always been focused on building resilient systems, sometimes to my own detriment velocity wise. Balancing the momentum of delivery features and improving reliability is always a tough issue to tackle. Automation isn’t free. It requires effort and time to do correctly. This investment can help scaling up what a team can handle, but requires slower velocity initially to do it right.
How do you balance automating and coding solutions to manual fixes, when you often can’t know the future changes in priority?
This is personal experience rather than prescriptive guidance. Very interesting personal experience.