Failure Modes In Event-Based Systems

Dave Copeland has an interesting article on understanding how message- and event-based systems can fail:

The system trigger (1) initiates everything. Common failures here are timeouts inside Application. This is particularly insidious because when this happens, the System Trigger may retry the operation. Think about a user on a webpage getting a 500 error. They will likely retry what they were doing until it succeeds.

This means that the entire workflow could be triggered multiple times, and it could be done in a way that is not programmatically obvious. Imagine our Merchandise buyer marking down an item’s price, and the entire operation succeeds but at the last minute their Internet connection dies and they get an error. They will repeat the markdown action and now there will be two messages about the inventory price being sent.

This is an interesting read.  Also, definitely check out Dave’s earlier post on how there is no happy path; it seems that most developers only code for a chimera, as there is so much code that assumes everything will work perfectly.

Related Posts

Replicating Extra-Long Strings

Monica Rathbun walks us through a replication error: Ever seen the below error? Until this week I hadn’t. So, I figured I’d take a little time and introduce it to those that had not. Error Description: Length of LOB data (65754) to be replicated exceeds configured maximum 65536. Use the stored procedure sp_configure to increase […]

Read More

Troubleshooting SQL Server Error 18456 State 73

Thomas Rushton reproduces an error state in SQL Server: A question asked on one of the forums today wasn’t easily answerable by Googling. Summary of the question “I have error 18456 State 73 – why?” Google seemed remarkably quiet on the subject of that particular state code. Even Aaron Bertrand’s list of causes of state codes […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031