Won’t you see my neighbor? User predictions, mental models, and similarity-based explanations of AI classifiers
Abstract
Humans should be able work more effectively with artificial intelligence-based systems when they can predict likely failures and form useful mental models of how the systems work (Johnson, et al. 2014, Klein, at al. 2005, Bansal et al. 2019, Tomsett, et al. 2020). We conducted a study of people’s mental models of artificial intelligence systems using a high-performing image classifier, focusing on participants’ ability to predict the classification result for a particular image. Participants viewed images in one of two classes and then predicted whether the classifier would label them correctly and indicated their confidence in their predictions. Participants also provided their own assessment of the correct class. In this experiment, we explored the effect of giving participants additional information. We showed them an array of the image’s nearest neighbors in a space representing the otherwise uninterpretable features extracted by the lower layers of the classifier’s neural network, using t-distributed stochastic neighbor embedding. We found that providing this neighborhood information did increase participants’ prediction performance, and that the performance improvement could be related to the neighbor images’ similarity to the target image. We also found indications that the presentation of this information may influence people’s own classification of the target image; in some cases after viewing the image’s neighbors, participants’ accuracy in identifying the actual class of the image was significantly worse when given the additional information, particularly when the set of neighbor images included images from the incorrect class. They became “mechanomorphized” in their own judgements, rather than anthropomorphizing the classifier’s process. There was also a significant relationship between reported confidence in predictions and accuracy, indicating that at a given level of confidence, participants in a control condition were significantly less accurate than experimental participants. In addition to the differences in mental models suggested by prediction accuracy and confidence, participants in the control and experimental conditions differed in how the described their mental models in comments on the image stimuli. Participants with less information tended to discuss image details, whereas those with more seemingly tried to find a pattern across the images and so focused more on the classifier itself and less on the image.
Keywords: artificial intelligence, explainability, mental models, human-AI teaming
DOI: 10.54941/ahfe1001440
Cite this paper
More from this volume
- Using Artificial Intelligence to Improve Human Performance: A Predictive Management Strategy
- Robust AI for Accident Diagnosis of Nuclear Power Plants Using Meta-Learning
- Detection of inappropriate images on smartphones based on computer vision techniques
- Econometric Modeling for the Management and Decomposition of Financial Risk
- Artificial vision system to detect the mood of an Alzheimer's patient
- Analysis of citizen's sentiment towards Philippine administration's intervention against COVID-19
- The Effect of Varying Levels of Automation during Initial Triage of Intrusion Detection
- Generating a Multimodal Dataset Using a Feature Extraction Toolkit for Wearable and Machine Learning: A pilot study
- Hepatitis predictive analysis model through deep learning using neural networks based on patient history
- An analysis model for Machine Learning using Support Vector Machine for the prediction of Diabetic Retinopathy
- Supradyadic Trust in Artificial Intelligence
- Artificial Intelligence in aviation decision making process.The transition from extended Minimum Crew Operations to Single Pilot Operations (SiPO)


AHFE Open Access