Your trusted lead on recommenders rushes up with a new paper in hand. Just back from the RecSys conference where the paper was presented he shows you the results. It appears your top-N recommender could be made several percentage points better using the new technique. The downside is that it would require you to adopt one of the new DNN collaborative filtering models which would be much more compute intensive and mean a deep reskilling dive for some of your team.
Would you be surprised to find out that the results in that paper are not reproducible? Or more, that the baseline techniques to which it was compared to show improvements were not properly optimized. And, if they had been, the much simpler techniques would be shown to be superior.
In a recent paper “Are We Really Making Much Progress”, researchers Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach raise a major red flag. Houston, we have a reproducibility problem.
Having worked through some of these papers for a different algorithm, I’m not that surprised. Sometimes it seems like improvements are limited solely to the data set and scenario the authors came up with, though that may just be the cynic in me.
This article is a good reason for looking at several types of models during the research phase, and even trying to keep several models up to date. It’s also a reminder that if you’re looking at papers and hot algorithms, make sure they include a way to get the data used in testing (and source code, if you can).