Kaarel Moppel has a public service announcement:
At last week’s local Postgres user group meetup here in Estonia, one of the topics was HA and recent Patroni (the most popular cluster manager for Postgres) improvements in supporting quorum commit, which by the way on its own has been possible to use for years. Things went deep quickly and we learned quite a bit of course. Including a good reminder that you shouldn’t build your bank on Patroni’s default synchronous mode 🙂
Anyways, during the hallway track (which sometimes are as valuable as the real ones) got an interesting question – with some 3+ quorum nodes, is Postgres then 100% bulletproof against all kinds failures? Excluding meteorites, rouge DBAs and such of course. One could think so, right? Nope.
Read on to learn what might cause failure in that scenario. Guaranteeing synchronous replication between machines over a network is a surprisingly difficult challenge.