For this to actually happen we need the actual system to be in our concept space, a lot of training data, and an abundance of caution.
In practice what we see more and more is the training procedure in fact attacks the evaluation procedure. It doesn’t just improve the quality of the fit artifact, but through mere optimization accidentally exploits weaknesses in the measurement system itself. When this happens, fitting does the following.
In ML training, we often accidentally “teach to the test” by comparing models via test data, which over time selects for models which are better fits for the test data. As John notes, this can come two separate ways and if you don’t define your optimization strategy correctly, you can accidentally train models which optimize on non-realistic things. A classic example is the neural network which could pick out malignant tumors from non-malignant tumors not because of any property of the tumor itself but rather because the malignant tumor images all had rulers in them and the non-malignant images did not. Read the whole thing for a second pitfall you can hit when training models.