Can We Trust Them? Examining the Ethical Consistency of Large Language Models to Perturbations
Open Access
Article
Conference Proceedings
Authors: Manuel Delaflor Rodrguez, Cecilia Delgado Solorzano, Carlos Toxtli
Abstract: The increasing reliance on Large Language Models (LLMs) raises a crucial question: can these powerful AI systems be trusted to make ethical choices? This study presents an analysis of LLM ethical behavior, examining 25,200 queries across 24 different models, including both proprietary and open-source variants. We evaluate LLM responses to 70 ethical vignettes spanning six domains, employing a novel perturbation methodology to assess the robustness of their ethical decision-making under varying contexts and framing. Our findings reveal that while larger models generally exhibit higher consistency, particularly with Chat-style instructions, significant variations emerge when faced with contextual changes, stakeholder adjustments, and across different ethical domains. To explain these findings, we introduce a novel framework— extit{survival-relevant pattern recognition}—which argues that ethical behavior in both humans and AI arises from recognizing and responding to patterns associated with survival and social cohesion.
Keywords: Ethics, AI, LLM, GPT
DOI: 10.54941/ahfe1005925
Cite this paper:
Downloads
16
Visits
72