I'm sort of speechless at how mind-bogglingly stupid every step of this process has been:
The papers attempted to train neural networks to distinguish between autistic and non-autistic children in a dataset containing photos of children’s faces. Retired engineer Gerald Piosenka created the dataset in 2019 by downloading photos of children from “websites devoted to the subject of autism,” according to a description of the dataset’s methods, and uploaded it to Kaggle, a site owned by Google that hosts public datasets for machine-learning practitioners.
The dataset contains more than 2,900 photos of children’s faces, half of which are labeled as autistic and the other half as not autistic.
After learning about a paper that cites the dataset, “I went and downloaded the dataset, and I was completely horrified,” says Dorothy Bishop, emeritus professor of developmental neuropsychology at the University of Oxford. “When I saw how it was created, I just thought, ‘This is absolute bonkers.’”
Without identifying each child in the dataset, there is no way to confirm that any of them do or do not have autism, Bishop says.