In a recent blog post Luke Oakden-Rayner (@DrLukeOR) discusses some drawbacks of the AI-competition approach to advancing machine learning. While most can agree AI-competitions are great for networking, recruiting and learning, @DrLukeOR is skeptical about the usefulness of the models that ‘win’ these competitions.
AI competitions are fun, community building, talent scouting, brand promoting, and attention grabbing. But competitions are not intended to develop useful models.
While slightly controversial, @DrLukeOR walks through several scenarios explaining this point of view. The summary is that evaluation of models to identify winning teams is statistically under-powered. This is primarily because test sets for model evaluation are too small to separate the vast majority of comepting models. Additionally, testing the model once or twice is insufficient to assess accuracy. The post also touches on competition models over-fitting data making it difficult to apply them to real world problems, specifically in medicine (@DrLukeOR’s field).
In response to this critique, community members pointed out that competitions foster the development of novel AI frameworks which push the field forward AND, as @DrLukeOR puts it, they’re ‘fun, community building, talent scouting, brand promoting…’ opportunities. I would add that the winners receive financial awards and bragging rights. That said, some restructuring in how models are assessed or how money is awarded (maybe to the top 10 models instead of the top model?) might help to assuage some of these concerns.
If you would like to learn more about AI competitions checkout this post. If you want to play around with ImageNet predictions (and face off against the model) checkout this project from Stanford. If you would like to learn more about over-fitting and AI competitions here are a few links from the community:
- Why rankings of biomedical image analysis competitions should be interpreted with care
- Do ImageNet Classifiers Generalize to ImageNet?
- Do CIFAR-10 Classifiers Generalize to CIFAR-10?
- Cold Case: The Lost MNIST Digits
- Do Better ImageNet Models Transfer Better?
- Measuring the tendency of CNNs to Learn Surface Statistical Regularities