Automated Generation of Situational Judgment Tests for Civil Aviation Flight Attendants Using Large Language Models: Method and Preliminary Evaluation

Yaqian Liu; Qida Hao; Jian Cheng; Cuixia Ma; Bo Jia; Peiru Chen; Gang Jie; Jingyu Zhang

doi:10.54941/ahfe1007533

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Automated Generation of Situational Judgment Tests for Civil Aviation Flight Attendants Using Large Language Models: Method and Preliminary Evaluation

Open Access

Article

Conference Proceedings

Authors: Yaqian Liu, Qida Hao, Jian Cheng, Cuixia Ma, Bo Jia, Peiru Chen, Gang Jie, Jingyu Zhang

Abstract

In the field of civil aviation, the psychological competency characteristics of cabin crew members are directly related to service quality and flight safety. Although situational judgment tests (SJTs) have proven to be an effective assessment method, their development is costly and time-consuming. The breakthroughs in large language models (LLMs) offer new opportunities for the automated development of assessment tools. Using verbatim transcripts from critical incident interviews with frontline flight attendants as the primary data source, this study aims to construct and validate a retrieval-augmented generation (RAG)-driven workflow for automatically generating SJT items. An expert evaluation approach was employed to assess the quality of items generated by three large models (Model 1: qwen3-14b; Model 2: qwen3-32b; Model 3: deepseek-r1-32b). The results provide preliminary evidence for the feasibility of an automated development pathway for psychological assessment tools based on LLMs and RAG technology, which can significantly improve item development efficiency. However, this study represents an initial exploration, and further research as well as validation through large-scale empirical data are required to optimize and enhance model performance.

Keywords: LLMs, Situational Judgment Test (SJT), Automated Item Generation (AIG), Retrieval-augmented Generation (RAG), Flight Attendant Competency, Psychometrics

DOI: 10.54941/ahfe1007533

Cite this paper

Downloads

37

Visits

59

Download PDF

More from this volume

← Deep learning for eye-gaze event detection for personalized gaze-based interaction in real-world settings Usability Testing of Virtual Reality for Visualizing Indoor Smoke Propagation and Extraction →

View all articles in Human-Computer Interaction & Emerging Technologies →