Alexander Arvidsson sets expectations:
You’ve written a prompt. It works beautifully. You ship it to production.
Three days later, someone reports wildly different answers to identical questions. You run the exact same input and get a different result than yesterday. Your test suite passes locally, fails in CI, passes again on re-run.
Welcome back to non-determinism in Large Language Models.
Click through for some practical tips on how you can reduce non-deterministic behavior, as well as the trade-offs of doing so.