Can We Trust Them? Examining the Ethical Consistency of Large Language Models to Perturbations

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Can We Trust Them? Examining the Ethical Consistency of Large Language Models to Perturbations

Open Access

Article

Conference Proceedings

Authors: Manuel Delaflor Rodrguez, Cecilia Delgado Solorzano, Carlos Toxtli

Abstract: The increasing reliance on Large Language Models (LLMs) raises a crucial question: can these powerful AI systems be trusted to make ethical choices? This study presents an analysis of LLM ethical behavior, examining 25,200 queries across 24 different models, including both proprietary and open-source variants. We evaluate LLM responses to 70 ethical vignettes spanning six domains, employing a novel perturbation methodology to assess the robustness of their ethical decision-making under varying contexts and framing. Our findings reveal that while larger models generally exhibit higher consistency, particularly with Chat-style instructions, significant variations emerge when faced with contextual changes, stakeholder adjustments, and across different ethical domains. To explain these findings, we introduce a novel framework— extit{survival-relevant pattern recognition}—which argues that ethical behavior in both humans and AI arises from recognizing and responding to patterns associated with survival and social cohesion.

Keywords: Ethics, AI, LLM, GPT

DOI: 10.54941/ahfe1005925

Cite this paper:

Downloads

171

Visits

246