John Mount shows us how to use the psagg
function in wrapr
to ensure that functional dependencies are valid:
Notice only grouping columns and columns passed through an aggregating calculation (such as max()
) are passed through (the column z
is not in the result). Now because y
is a function of x
no substantial aggregation is going on, we call this situation a “pseudo aggregation” and we have taught this before. This is also why we made the seemingly strange choice of keeping the variable name y
(instead of picking a new name such as max_y
), we expect the y
values coming out to be the same as the one coming in- just with changes of length. Pseudo aggregation (using the projection y[[1]]
) was also used in the solutions of the column indexing problem.
Our wrapr
package now supplies a special case pseudo-aggregator (or in a mathematical sense: projection): psagg()
. It works as follows.
In this post, John calls the act of grouping functional dependencies (where we can determine the value of y based on the value of x, for any number of columns in y or x) pseudo-aggregation.