When somebody says "does not work" the first question should be about what counts as "working". Here Donald Clark writes that an A-B test (basically, you compare two interventions side-by-side) shows that "the gamification lesson plan fared worse than non-gamified lesson plans." In the report (20 page PDF) from 2016 the researchers use a platform to conduct "rapid randomized controlled trials (RCT), the evidentiary gold standard of evaluation for assessing what works." This 'gold standard' consists of pretests and post-tests consisting of a "set of six or ten post-exercise multiple-choice questions." In these trials, the non-gamification system reliably reported better results than the gamified system.
So there is a bunch of things that could be said about this. I'll begin by citing from The 74 advocacy blog a story (possibly fictional) where a parent responds to a teacher: “Yes, I know he can write,” she sighed, “but does he have a friend? Does he ever play with anybody?” Something like this won't show up on the A-B test, of course, because it's not being measured. Nor can you measure the multiple objectives for any intervention, especially when these objectives vary from person to person. Imagine what OLDaily would look like if it were created using A-B tests as a guide. Yes, it would be more popular. But it would cover cat photos and clickbait!