💾

Label Errors in ML Test Sets

Entry
Type
Dataset
Year

2021

Posted at
Tags
image
image
右のは確かにトイレットペーパーに見える
右のは確かにトイレットペーパーに見える

Overview - 何がすごい?

We identify label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets, and subsequently study the potential for these label errors to affect benchmark results. Errors in test sets are numerous and widespread: we estimate an average of 3.4% errors across the 10 datasets, where for example 2916 label errors comprise 6% of the ImageNet validation set. Putative label errors are identified using confident learning algorithms and then human-validated via crowdsourcing (54% of the algorithmically-flagged candidates are indeed erroneously labeled). Traditionally, machine learning practitioners choose which model to deploy based on test accuracy - our findings advise caution here, proposing that judging models over correctly labeled test sets may be more useful, especially for noisy real-world datasets. Surprisingly, we find that lower capacity models may be practically more useful than higher capacity models in real-world datasets with high proportions of erroneously labeled data. For example, on ImageNet with corrected labels: ResNet-18 outperforms ResNet50 if the prevalence of originally mislabeled test examples increases by just 6%. On CIFAR-10 with corrected labels: VGG-11 outperforms VGG-19 if the prevalence of originally mislabeled test examples increases by just 5%.

ImageNetやMNIST、CIFAR-10といった著名はデータセットの中にある誤ってラベル付されたデータをまとめたデータセット

image

Technology

  • Confident Learningと言われる手法で間違っていそうな画像を探す → Amazon Mechanical Turkでクラウドソーシングで複数の人間に確認してもらう

Further Thoughts

  • 人がどのような写真を見間違えるのかという参考になる。

Links

Blogポスト