Investigating common factors needed for consumers to trust AI\ML

Open Access
Article
Conference Proceedings
Authors: Alana NagyBruce NagyScot Miller

Abstract: Is there a set of trust factors that might apply to all Machine Learning (ML) algorithm types and domain applications, independent of behavioral variations? Can this common set of factors support a baseline standard represented by a ML trust scorecard? These questions are being investigated by The Technical Cooperation Program (TTCP) involving Australia, Canada, New Zealand, United Kingdom (UK), and the United States of America (USA). This paper describes the results of an initial investigation into whether a common set of factors allows consumers to initially trust ML in critical situations. The goal was to determine if job role variations were statistically unaffected by confounder bias by modeling causal relationships and analyzing influences. Through Qualtrics, questions containing factors derived from TP 8864 AI Level of Rigor, the document used by USA and UK governments to develop official guidance, were deployed to 81 international participants consisting of various roles with technology, specifically developers, operators, and users. Participant roles consisted of a mix of autonomous and ML Systems used in surface, subsurface and land system domains. Not all autonomous participants had ML knowledge. Introducing a Behavioral Dynamics Model (BDM) became key in designing Likert scale questions containing perception, needs, and experience grouping of related factors. This design allowed for a statistical investigation of whether causality between groups affect bias towards ML. The BDM survey grouped trust factors that mapped to a ML Scorecard design consisting of Calibration, Experience, and Fatality (CEF) categories: - Calibration (ML algorithm’s limitation and strengths – represents testing requirements): --- (Likert Scale) Perceptions factors investigated: Safety, Dependability, Reliability, Suspicion, and Comfortability. --- (Likert Scale) Needs factors investigated: Human Oversight, Performance, Development, Teamwork, Adaptation, Improve Ability of Success, and Proof. - Experience (ML Algorithm’s ability to conform to consumer paradigms – represents training requirements): --- (Likert Scale) Experience factors investigated: Positive History, Past Usage, Training Adequacy, and Expectations ML Systems Fail on First Use. - Fatality (ML technology’s ability to provide decision rationale – represents development requirements): --- Open-Ended Questions: Responses aligned to Perceptions, Needs and Experience factors with emphasis on demonstrating transparency, security, certification, and ethics. By using a statistical decomposition approach of 19 hypothesis investigated using ANCOVA, ANOVA and t-test analysis, common factors for a scorecard emerged, with one exception involving adaptation in the Calibration category. From the open-ended questions, different patterns emerged based on role variations for developer, operator, and user. The key similarity was that to establish trust, strong evidence through observation or test is needed. Differences were that developers wanted oversight and reliability of an ML system, while users and operators generally wanted ML operational capability experience. Additionally, evidence indicated that the ML system needs to be trained to replace human interaction either by conforming to the participant’s past experiences or ensuring that the participant is adequately trained to trust a new ML paradigm. The findings showed that the Behavioral Dynamics Model successfully extrapolated TP 8864 guidance into questions about trust that statistically determined a common set of factors in a CEF scorecard for ML algorithms, independent of technical roles.

Keywords: Artificial Intelligence, Machine Learning, Trust Factors, Trust Scorecard, Likert Scale Survey, Qualtrics

DOI: 10.54941/ahfe1005574

Cite this paper:

Downloads
81
Visits
381
Download