Is LLM a reliable risk detector? An evaluation of large language models in EMR-related medical incident detection
Abstract
Medical institutions typically rely on manual analysis of adverse medical events, which requires significant human resources, time, and specialized knowledge and expertise. These requirements reduce the effectiveness of identifying potential risks. Can large language models (LLMs) leverage their powerful natural language processing capabilities to function as reliable risk detectors? In this pilot study, we aim to evaluate the effectiveness of LLMs in identifying electronic medical record system (EMR)-related medical incident risks. We first curated a dataset comprising 573 medical incident reports that had been manually analyzed. Then, using a few-shot prompting approach, we designed instructions to evaluate five LLMs, including GPT-4o, Claude 3.5 Sonnet, DeepSeek V3, Nova Pro, and Llama 3.1-405b. The results indicated that the best-performing LLMs could accurately extract more than half of the risk factors and generate reasonable explanations grounded in real-world case contexts. While general-purpose LLMs can provide some assistance, further optimization tailored to specific medical scenarios is necessary to enhance their capability in handling complex cases.
Keywords: Healthcare safety, Large Language Models, Medical incidents, Prompt engineering, Risk factors
DOI: 10.54941/ahfe1006630
Cite this paper
More from this volume
- A Lip Reading Recognition System Based on SimAM and TCN
- Developing Effective VR Training Simulations for Additive Manufacturing: A Modular Usability-Driven Design Approach
- Marionette-Inspired Interface: Bridging Traditional Puppetry and Modern Avatar Control
- LightBUY - Developing Cloud Sales Design Specifications from the Ground Up
- Development of Color Universal Design Education System
- Realtime Video Underlay for Accessible Television Graphics
- The Impact of Cultural Values on Human-AI Collaboration in a Decision-Making Task
- The Impact of Time Constraints on Moral Decision-Making during Human-AI Interaction
- Knowledge of Results (KR) and Vigilance: Are Feedback Effects Due to Information or Motivation?
- Leveraging Digital Twins and Generative AI to Alleviate Loneliness Among Elderly Adults Living Alone Through Smart Flowerpot Design
- The Benefits of Adopting Artificial Intelligence-Technologies in Mitigation Construction Risk in the South African Construction Industry
- Determinants of Quality Coping and Knowledge Acquisition in Professional Work and Academic Study Systemic Interaction


AHFE Open Access