Automated Generation of Situational Judgment Tests for Civil Aviation Flight Attendants Using Large Language Models: Method and Preliminary Evaluation
Abstract
In the field of civil aviation, the psychological competency characteristics of cabin crew members are directly related to service quality and flight safety. Although situational judgment tests (SJTs) have proven to be an effective assessment method, their development is costly and time-consuming. The breakthroughs in large language models (LLMs) offer new opportunities for the automated development of assessment tools. Using verbatim transcripts from critical incident interviews with frontline flight attendants as the primary data source, this study aims to construct and validate a retrieval-augmented generation (RAG)-driven workflow for automatically generating SJT items. An expert evaluation approach was employed to assess the quality of items generated by three large models (Model 1: qwen3-14b; Model 2: qwen3-32b; Model 3: deepseek-r1-32b). The results provide preliminary evidence for the feasibility of an automated development pathway for psychological assessment tools based on LLMs and RAG technology, which can significantly improve item development efficiency. However, this study represents an initial exploration, and further research as well as validation through large-scale empirical data are required to optimize and enhance model performance.
Keywords: LLMs, Situational Judgment Test (SJT), Automated Item Generation (AIG), Retrieval-augmented Generation (RAG), Flight Attendant Competency, Psychometrics
DOI: 10.54941/ahfe1007533
Cite this paper
More from this volume
- Brain-Computer Interface versus Brain-Computer Interaction
- Human–AI Interaction as a Catalyst for Interdisciplinary Co-Creation: Exploring Prompt-Driven Visualization in Design Education
- Context-aware LLMs for healthcare requirements engineering
- Understanding the Needs and Challenges of Developing Robot Teleoperation Applications using Mixed Reality Headsets
- Daughter-Led Intergenerational Collaboration: Human-Computer Interaction in APP-Based IUD Removal Support for Midlife Women
- The Effect of the Degree of Multimodal Information Explanation by AI Streamers on Consumers’ Purchase Intention: The Moderating Role of Product Type
- Refining Research Questions for AI-Assisted Knowledge Retrieval in Interior Design: An Exploratory Study of Expert Judgment
- Performance Trust in AI Reduces Cognitive Workload: Evidence from Structural Equation Modeling and Item-Level Analysis
- The Impact of Direct and Third-Party Control: A Comparison of the Usage of AI Advice in Hiring Decisions
- User Perceptions of Response Inconsistency and Trust in AI-Assisted Learning
- Designing a Rhythmic AR Interaction for Auditory-Oriented Heritage: A Preliminary Case Study at Guqintai
- Feedback-Driven Adaptive AR Assistance for Intralogistics: Design and Initial Evaluation


AHFE Open Access