This repository contains the code and images used in Jacquot et al., 2020 [arXiv, CVPR]. There is a 1 minute video presentation of our work on youtube.
Our work builds on the observation that image datasets used in machine learning contain many biases. Those biases help convolutional neural networks to classify images. For example, in the UCF101 dataset, algorithms can rely on the background color to classify human activities.
To address this issue, we followed a rigorous method to build three image datasets corresponding to three human behaviors: drinking, reading, and sitting. Below are some example images from our dataset. The models misclassified the bottom left, middle top, and bottom right pictures, whereas humans correctly classified all six pictures.
We reduced biases in our image datasets by applying 100 to 300 cross-validations of a fine-tuned deep convolutional network (computer-vision/keras/misclassification_rate and computer-vision/matlab/alexnet_misclass_rate.m). The many cross-validations allowed to rank images along their misclassification rate. We then excluded images that were classified too easily. Thus, we obtained datasets that were less biased, more difficult to classify by algorithms.
The ground truth labels for each image was created by asking 3 participants to assign each image to a yes or no class for each action. We also conducted a separate psychophysics experiment (human-vision): images were presented to human participants; each trial consisted of fixation (500 ms), image presentation (50, 150, 400, or 800 ms), and a forced choice yes/no question.