Jens Vestergaard says don’t trust, do verify:
In two previous posts I went down the path of getting a semantic model ready for AI: descriptions on every measure, an instructions file, the schema tidied up enough that a Fabric data agent has something real to read. That work has a satisfying endpoint. The model looks ready.
Ready is not the same as right.
The kind of evaluation Jens is talking about is fundamental to good business intelligence practices, regardless of whether you throw language models into the mix. Where language models do add complexity is the arbitrary scope of questions, how ambiguous people tend to be when writing, and the stochastic nature of answers. All of that makes the problem harder, though at least it isn’t an entirely different class of problem to solve.