John Cook describes a paradox with neural nets:
Deep neural networks have enough parameters to overfit the data, but there are various strategies to keep this from happening. A common way to avoid overfitting is to deliberately do a mediocre job of fitting the model.
When it works well, the shortcomings of the optimization procedure yield a solution that differs from the optimal solution in a beneficial way. But the solution could fail to be useful in several ways. It might be too far from optimal, or deviate from the optimal solution in an unhelpful way, or the optimization method might accidentally do too good a job.
Conceptually, this feels a little weird but isn’t really much of a problem, as we have other analogues: rational ignorance in economics (where we knowingly choose not to know something because the benefit is not worth the opportunity cost of learning), OPTIMIZE FOR UNKNOWN with SQL Server (where we knowingly do not use the passed-in parameter because we might get stuck in a lesser path), etc. But the specific process here is interesting.