Human Factors and Simulation

book-cover

Editors: Julia Wright, Daniel Barber

Topics: Simulation and Modelling

Publication Date: 2022

ISBN: 978-1-958651-06-3

DOI: 10.54941/ahfe1001484

Articles

Novices as models of expert operators: Evidence from the NRC Human Performance Test Facility

Humans are integral to the safe operation of a nuclear power plant (NPP). Following the Three Mile Island accident in 1979, the United States Nuclear Regulatory Commission (NRC) began focusing on incorporating good human factors engineering design principles in regulation and emphasizing the importance of adequate training of plant operations staff. As part of this focus, NRC amended its regulations to require facility licensees to have simulation facilities for use in administering NRC operating tests and licensed operator requalification training (52 FR 9460). Since then, the simulator has become an important tool for operator training and license examinations. As technology develops, new designs and technology becomes available to the nuclear power community. The staff of NRC is responsible for reviewing and determining the acceptability of new designs to ensure they support safe plant operations. Since the human operator is vital to NPP safety, NRC must understand the potential impact of new designs on human performance to support sound regulatory decisions (Hughes, D’Agostino, & Reinerman-Jones, 2017). Despite the importance of human performance in plant safety, much of the basis for current NRC Human Factors Engineering guidance is from other domains (e.g., aviation, defense), qualitative data from operational experiences in NPPs, and limited empirical studies in a nuclear environment (Hughes & D’Agostino, 2016). To close this data gap, NRC launched the Human Performance Test Facility (HPFT) project to explore the impact of new designs, technologies, and concepts of operations on human performance using generic simulator platforms.One of the challenges for conducting human performance research in the nuclear domain is access to trained operators. Without sufficient sample size, it is difficult to perform analyses with adequate statistical power and draw substantial conclusions. To overcome the participant access challenge NRC partnered with the University of Central Florida (UCF) and use college students as a proxy for expert operators to study the impact of traditional and new Main Control Room (MCR) designs, technologies, and concepts of operations on performance of common NPP tasks and physiological and subjective workload. This approach follows the principle of “equal but different”. This means that students experienced simplified versions of complex tasks and the system user interface.This paper will review data collected from three experiments and summarize the evidence revealed by using novices as models of expert operators in the nuclear domain. Across the experiments novices and expert operators interact with touchscreen or desktop versions of an NPP MCR interface. Performance and workload were examined. Additionally, the studies sought to validate the methodology of the “equal but different” principle. Taken together the studies revealed that the “equal but different” method induced comparable cognitive demands in students and experts. This means that student novices can stand in for expert operators to help identify workload-related safety concerns in the nuclear domain. Future research will extend this approach to other MCR technologies, such as automation and novel control room configurations. Further, the method developed in the HPTF can be applied in other domains where access to experts is limited.

Jinchao Lin, Gerald Matthews, Niav Hughes, Kelly Dickerson
Open Access
Article
Conference Proceedings

The importance of assessing both expert and non-expert populations to inform expert performance

Realizing the benefits of research for human factors applications requires that academic theory and applied research in operational environments work in tandem, each informing the other. Mechanistic theories about cognitive processing gain insight from incorporating information from practical applications. Likewise, human factors implementations require an understanding of the underlying nature of the human operators that will be using those very implementations. This interplay holds great promise, but is too often thwarted by information from one side not flowing to the other. On one hand, basic researchers are often reluctant to accept research findings from complex environments and a relatively small number of highly-specialized participants. On the other hand, industry decision makers are often reluctant to believe results from simplified testing environments using non-expert research participants. The argument put forward here is that both types of data are fundamentally important, and explicit efforts should bring them together into unified and integrated research programs. Moreover, effectively understanding expert performance requires assessing non-expert populations.For many fields, it is critically important to understand how operators (e.g., radiologists, aviation security officers, military personnel) perform in their professional setting. Extensive research has explored a breadth of factors that can improve, or hinder, operators’ success, however, the vast majority of these research endeavors hit the same roadblock—it is practically difficult to test specialized operators. They can be hard to gain access to, have limited availability, and sometimes there just are not enough of them to conduct the needed research. Therefore, non-expert populations can provide a much-needed resource. Specifically, it can be highly useful to create a closed-loop ecosystem wherein an idea rooted in an applied realm (e.g., radiologists are more likely to miss an abnormality if they just found another abnormality) is explored with non-experts (e.g., undergraduate students) to affordably and extensively explore a number of theoretical and mechanistic possibilities. Then, the most promising candidate outcomes can be brought back to the expert population for further testing. With such a process, researchers can explore possible ideas with the more accessible population and then only use the specialized population with vetted research paradigms and questions.While such closed-looped research practices offer a way to best use available resources, the argument here is also that it is necessary to assess non-experts to fully understand expert performance. That is, even if researchers have full access to a large number of experts, they still need to test non-experts. Specifically, assessing non-experts allows for quantifying fundamentally important factors, such as strategic vs. perceptual drivers of performance and the time course of learning. Many of the potential gains in the applied sphere come from selecting the best people to train into becoming experts; without non-expert performance it is impossible to know how to enact that selection or to divorce the effects of extensive practice and expertise from the operational environment. While there has been an, at times, adversarial relationship between research practices that use non-expert vs. expert participants, the proposal here is that embracing both is vital for fully understanding the nature of expert performance.

Stephen Mitroff, Emma Siritzky, Samoni Nag, Patrick Cox, Chloe Callahan-Flintoft, Andrew Tweedell, Dwight Kravitz, Kelvin Oie
Open Access
Article
Conference Proceedings

Toward a Consequential Validity Perspective for Selecting Participant Groups in Testing and Evaluation Studies for Complex Systems

Testing and evaluation of technology design for complex systems cannot readily attain conclusive results. This is because skilled professionals are often not available for testing while non-professionals may not be capable of operating the actual systems or high-fidelity simulators. Thus, practitioners and applied scientists can be challenged with decisions on selecting participant groups, which can severely constrain choices in the experimental tasks. This article presents the perspective of consequential validity, highlighting that general validity or rules to participant selection probably do not exist. Most importantly, the validity of a testing method or an empirical finding critically rests on the decisions of interest that must take into account nuances or idiosyncrasies of specific situations and desired outcomes. This perspective stands in contrast to how the literature predominantly portrays validity of testing methods or empirical findings as universal rather than focusing on outcomes within the confines of the study methods. The perspective of consequential validity calls for studies on how classical metrics of reliability and validity could manifest in consequence of specific decisions informed by empirical testing.

Nathan Lau, Ronald Boring
Open Access
Article
Conference Proceedings

Studying Control Room Operations on a Shoestring Budget - Reflections on the Rancor Microworld

As the U.S. continues to develop and mature advanced reactor designs, the nuclear industry is becoming increasingly aware of the need for good human factors are to ensure safe, reliable, effective, and economical concept of operations. Advanced reactor designs aim to reduce staffing, and significant operational costs, by adopting high levels of automation. The highly automated control system designs must be informed with human factors and human reliability data. The proposed concepts of operations are unlike the current, largely manual, concept of operations found in operating nuclear power plants. Human performance data collection has proven difficult to obtain for existing nuclear power plants. Human factors researchers working on advanced reactor designs will encounter these same fundamental challenges and more. The novel concept of operations and accompanying human-system interfaces are novel and require human performance data for validation and licensing. Methods to evaluate novel concepts of operations for diverse advanced reactor designs must be identified to aid vendors in their system design activities. The Rancor microworld is a simulation platform that is currently used to support advanced reactor vendors in developing their control room concepts. The rationale and historical use of the Rancor microworld demonstrates a unique and complimentary approach to traditional full-scope simulator data collection methods that rely on expert licensed operators. The Rancor microworld is a reduced-order model of a small modular reactor conceived and developed to support human performance research on nuclear operations topics. The microworld represents the core elements of a nuclear power plant sans the complexity associated with full-scope simulators that are typically used to support human factors and human reliability research. The impetus for the microworld as an alternative method to acquire human performance data stems from the challenges in performing full-scope simulator studies. Full-scope simulators are expensive to build and maintain. Furthermore, they require extensive expertise to develop scenarios to support specific hypothesis testing. Operations data is historically difficult to obtain since even large research organizations that can afford a full-scope simulator facility encounter sample size issues. Licensed operators are expensive and fully time committed to their employing nuclear power plant. As such, it is very difficult to perform research on nuclear control room operations with sufficient sample sizes to approach statistical significance and draw generalizable conclusions applicable to different designs. Therefore, an alternative population using a simplified simulator offers an approach to evaluate human factors issues. Through numerous studies, the Rancor microworld has demonstrated an effective means to leverage inexpensive and ubiquitous student participants to expand the data collection capability and build a corpus of human performance data to inform advanced reactor control system designs and human reliability modeling. This paper provides an overview of the Rancor microworld studies and describes the benefits and disadvantages of using novice participants in simplified simulator environments in contrast with licensed operators in full-scope simulator environments.

Thomas Ulrich, Ronald Boring, Roger Lew
Open Access
Article
Conference Proceedings

Measuring driving simulator adaptation using EDA

Most research about simulator adaptation focus on driving style and participants' comfort. However, in recent years, there is a growing interest in physiological data analysis as part of the user experience (UX) assessment. Furthermore, the application of machine learning (ML) techniques to those data may allow the automatic detection of stress and cognitive load. Previously, we noticed that new participants in experiments with our simulator were often in a constant state of tension. This prevented optimal training of our ML models as many of the collected data were not representative of a person's normal state.Our work focuses on improving driver's UX by keeping the cognitive load and stress at levels that do not interfere with the primary task of driving. We use a custom-made driving simulator as our testing platform and evaluate participants' emotional state with physiological signals, specifically electrodermal activity (EDA). EDA is the variation of the skin conductance created by sweat glands. It is linked to the sympathetic nervous system and is an indication of physiological and psychological arousal. We selected EDA because several studies have shown that it is a fast indicator of stress and cognitive load.To ensure that we are consistently collecting accurate data that could be fed to ML algorithms, we need to be able to correlate physiological reactions to external stimuli. We want to avoid them to be confused with general tension. Therefore, we need to determine the time it takes for most participants to physiologically adapt to our simulator. In this between-subjects study, we examined the impact of short time (ca. 10 min) exposures to the simulation and compared it with a longer exposure period (ca. 35 min).Another problem we faced was that some participants were too indisposed by driving in the simulator to complete testing sessions. Therefore, we needed to find a way to discriminate them during the recruitment process. Literature has shown that there might be a link between motion sickness and simulator sickness and in this study, we searched for a correlation between the motion sickness susceptibility questionnaire (MSSQ) and the self-reported simulator sickness using the simulator sickness questionnaire (SSQ).For our investigation, we recruited 22 people through an agency. They were divided in two groups. Group A (short-time exposures) had 10 participants between 25 and 69 years old (M=49.5; SD=17.1, 5 women, 5 men) and group B (long-time exposure) had 12 people between 28 and 65 years old (M=43; SD=12.8, 5 women, 7 men). We requested from the agency to recruit only active drivers of automatic transmissions cars as our simulator mimics this type of vehicle.Motion sickness susceptibility and discomfort felt in the simulator are moderately correlated. The coefficient value is 0.51. The number of participants of our study being small, further research is necessary to determine if the MSSQ can be used as a discriminator in the recruitment phase. In addition, we can conclude that a longer exposure of 35 min results overall in better physiological adaptation.

Marie-Anne Pungu Mwange, Fabien Rogister, Luka Rukonic
Open Access
Article
Conference Proceedings

Impact of Camera Perspective and Image Throughput on Human Trust of a Quadrupedal Robot Scout

The objective of this study is to understand user perceptions of robot behaviors. Specifically, we are interested in the possible effects of providing the user with different camera perspectives and with regular snapshots versus a continuous camera feed in the context of a small-unit military operation. The study will employ a mixed 2 (camera perspective: 1st person vs over the shoulder 3rd person) x 2 (camera feed: snapshots vs continuous) factorial design, with participants viewing a robot performing military tasks in both rural and urban operational settings. After viewing the robot’s performance, participants will answer performance questions based on the context of the military mission, as well as questionnaires that measure trust in the autonomous system. Dependent variables include performance outcomes from tactical performance questions and subjective results of the trust questionnaires. Data from participants will be analyzed with a 2x2 between subjects ANOVA. We anticipate that the findings will suggest that a third person perspective and continuous camera feed will result in the highest trust and best performance outcomes.

Ralph Brewer, Zachary Guyton, Tyler Long, Angela Vantreese, Mason Russell, Chad Kessens, Ericka Rovira
Open Access
Article
Conference Proceedings

Evaluation of Human-Autonomy Team Trust for Weaponized Robotic Combat Vehicles

Phase I of the Soldier Operational Experiment was held at Fort Carson, Colorado in 2020, to assess the current capability of a manned vehicle and unmanned weaponized vehicle collaborative team capabilities during live fire gunnery operations and situational training exercises. Here we discuss the performance of the crews during these exercises, and the implementation of team trust metrics to evaluate crew dynamics in these human-autonomy lethality teams. The gunnery exercise performance scores demonstrated that teams were often able to achieve qualifying scores on the relevant gunnery standards. However, subjective measures showed relatively low to moderate levels of trust across crew members. Through further analysis we found that Soldiers opted to perform many tasks manually and were slow to adapt to and use the technologies, even with substantial training on the systems. One possible reason for this response to the technology was due to the technology being early in a development cycle and completely new to the users. Linguistic analyses were conducted on the crew communication in order to provide a more fine-grained analysis of the team dynamic. Results indicated that higher performing crews used more formal communication with words associated with perception (e.g., seeing, hearing, etc.). In line with previous field studies through the Wingman Joint Capabilities Technology Demonstration, this study further validated a multi-method approach to understanding performance, trust, and cohesion in human-autonomy teams.

Ralph Brewer, Anthony Baker, Catherine Neubauer, Andrea Krausman, Daniel Forster, Angelique Scharine, Samantha Berg, Kristi Davis, Kristin Schaefer
Open Access
Article
Conference Proceedings

Ergonomics, digital twins and time measurements for optimal workplace design

Ergonomics and Human Factors are both defined as a scientific discipline concerned with understanding the interactions between workers and other elements of a system. The implementation of ergonomics in industrial engineering, where workers are an integral part of the system, is very important in the development phase of the product/production and also in the planning of production technologies. The interaction between man and machine can be very intense in mass production, especially in assembly lines, and is therefore the focus of process optimization. In addition, appropriate workplace design has long-term effects on the worker. It is well known that it can prevent musculoskeletal complaints, increase productivity and reduce production costs.As part of the current trend of Industry 4.0 (I4.0), the traditional approach to workplace design is becoming intertwined with "smart" paradigms such as sensors, computing platforms, communication technology, control, simulation, data-intensive modelling, and predictive engineering. It is therefore important for companies to understand the great potential of the I4.0 concept and leverage its benefits in terms of moving from machine-dominated manufacturing to digital manufacturing.These technologies offer us the possibility to reproduce the work environment in a virtual scenario where it is possible to simulate manual tasks, evaluate ergonomic indices and perform time analysis at the same time. The idea of using ergonomic simulation software is not new. Several attempts have been made in Europe in the past. Starting with DELTA's ERGOMAS, ERGOMan systems, Siemens Jack and more recently Process simulate, both possibly supported by Xsens suit. With the I4.0 paradigm in mind, we examined the featured computing platforms developed from 1994 to the present to track the progress and changes made. For simulations, the most progress was made with the development of the Task Simulation Builder interface and later an important step was made with the development of sensor technology for motion capture. For example, for assembly lines, an integrated approach for setting working times was developed using the classical MTM approach and EAWS methods. With these technologies and accumulated knowledge, the design process changed rapidly and several published papers show the benefits of computer-aided approaches also for timing analysis. Based on the presented facts, the question arose: can computer-aided approaches integrated with ergonomics replace the existing standardised approaches for time determination? In our research, a case study of workplace design was conducted using two of the latest platforms, Siemens Jack and Process Simulate in conjunction with Xsens suit. A collaborative human-robot workplace was designed as a digital twin and tested in our lab with 6 subjects considering their anthropometric measurements. The human movements were converted into computer software and evaluated using OWAS analysis for ergonomics and MTM method for timing. The results of the research carried out will help us to evaluate a similar approach carried out with two different computer platforms and to answer the question of the usefulness and reliability of the presented platforms also for time analysis.

Natasa Vujica-Herzog, Borut Buchmeister, Matic Breznik
Open Access
Article
Conference Proceedings

Toward a Systems Framework Coupling Safety Culture, Risk Perception, and Hazard Recognition for the Mining Industry

The United States mining industry has made steady progress to improve worker safety and reduce injuries. Despite these gains, the industry remains largely reactive in its approach to health and safety. There remains a primary focus on lagging indicators, such as the numbers of injuries, hours lost, and hazards found at the worksite. To facilitate a more proactive approach, new methods are needed to evaluate hazardous conditions and unsafe behaviors. This work explores the relationships among mine workers’ hazard recognition abilities, the individual’s perception of risk, and the safety culture of the mining workplace. We have conducted a literature review to identify key factors and analytical models in industries where health and safety are a major consideration, including construction, manufacturing, mining, and transportation. Our analysis considered both process-oriented frameworks, such as Systems Thinking approaches, and statistical methods, including Structural Equation Modeling (SEM). A meta-model was then developed to aggregate and examine key factors and potential causal relationships. We discuss the creation of this meta-model, identifying notable structural characteristics and hypotheses for future confirmatory analysis. Use cases are then outlined, including descriptive, evaluative, and generative applications.

Leonard Brown, Ngan Pham, Jefferey Burgess
Open Access
Article
Conference Proceedings

Toward Understanding Development of Team Resilience during Stress Exposure Training

The demand for understanding stress resilience in Soldiers has continued unabated for decades. In this paper we applied the Bowers et al. (2017) team resilience model to test hypotheses about whether U.S. Army squads participating in a three-day Stress Exposure Training would respond with resilient stress reactions, positive team and learning climate attitudes, and learning outcomes. Anxiety, depression, hostility, sensation seeking, and positive affect showed mild to strong indications of resilient “bounce back” after scenario based training, and positive team attitudes emerged early in training and remained high. Soldiers that reported higher team cohesion and learning climate scored higher on a post-training knowledge test. These findings indicate that individual and team resilience are emergent states and multiple measures of individual and team attitudes and behaviors are critical for diagnosing a team development over time. Recommendations for future research are discussed.

Joan Johnston, Debra Patton
Open Access
Article
Conference Proceedings

Optimizing Human Capital Performance: Influence of Simulation

Simulations have been employed to train people and provide novel environments topractice and test new skills as well as experiment with new concepts and procedures.The US Department of Defense (DoD) spends millions of dollars each year to provideboth live and virtual training to military personnel. Realizing that simulations offer aplethora of opportunities, the DoD is now spending millions of dollars to design anddevelop what it believes will be the optimal versions of synthetic trainingenvironments to train its workforce. Each of the military services has a slightlydifferent view of how simulation will or should support them in the future. This paperaims to provide readers with insights about the needed human requirements and thepath that the services are on to achieve their future visions with respect to simulation.It will briefly discuss historical, functional, and future views of how simulations havebeen, are being, and are envisioned to support the optimizing of human performance.

Glenn Hodges, Debbie Patton, Samantha Napier
Open Access
Article
Conference Proceedings

Distinguishing Between Dynamic Altitude Breathing Threats to Improve Training

Breathing related adverse physiological conditions are a prominent Warfighter pilot problem (Inspector General 2020). As a result of an investigation citing multiple types of adverse physiological conditions with various causes and symptoms (DoN 2017), there have been changes to training requirements to broaden the focus to include Dynamic Altitude Breathing Threat Training (DoN 2020). However, there remain questions about symptom definitions, distinctiveness, and response procedures that influence the content of this new training. In order to investigate the effects of different breathing conditions, the authors propose a between subjects design with adjustments to breathing conditions (i.e., restricted oxygen, restricted inhalation, restricted exhalation) using a mask on breathing device. Dependent measures include physiological data and pilot symptomology. The objective of this investigation is to inform awareness training for dynamic altitude breathing threats by validating instructional strategies and standard operating procedures for training implementation.Authors Note. The views of the author expressed herein do not necessarily represent those of the U.S. Navy or Department of Defense (DoD). Presentation of this material does not constitute or imply its endorsement, recommendation, or favoring by the DoD. NAWCTSD Public Release 22-ORL021 Distribution Statement A. Approved for public release; distribution is unlimited.

Kylie Fernandez, Mitchell Tindall, Beth Atkinson, Daniel Logsdon, Emily Anania
Open Access
Article
Conference Proceedings

Toward the Development of A Realistic, Low-Cost “Gender Retrofit Kit” For Use In Combat Medicine Training

BackgroundBystanders often hesitate when rendering first aid to females, particularly it requires disrobing the individual (Leary et al, 2018). In addition to the delayed application of first aid, the lifesaver’s actual task performance may also be less effective than when treating injured males. This can occur, for example, when the lifesaver does not fully expose the wound (Bell et al., 2020). The Army has invested heavily in the acquisition of realistic patient manikins for training combat medicine skills. However, given logistical constraints, it will be difficult to acquire an equal number of female patient manikins. Therefore, the purpose of this study was to develop and test a low-cost manikin “Gender Retrofit Kit” (GRK). The GRK included a breast “vest” that is affixed to the torso, a realistic vagina that is affixed to the groin, a wig, facial makeup, and instructions for “feminizing” the manikin’s appearance. MethodWe recruited a convenience sample of 36 Combat Lifesavers and Combat Medics who were completing their recurrent annual training. At the end of their scheduled training, the participants were invited to practice three medical procedures (treatment of penetrating trauma to the leg via tourniquet, treatment of gunshot wound via application of a chest seal, and treatment of tension pneumothorax via needle chest decompression). Of the three medical procedures, only the last two required disrobing the patient. Therefore, we hypothesized that if performance issues were to occur, they would be localized here. The participants practiced the three procedures using two different manikin types: a standard male manikin and the GRK-outfitted manikin. The order of manikin presentation was counterbalanced. Measures of task quality, task completion times, and usability questionnaires were collected. Results and ConclusionsThe sample was primarily male (78%), and included nearly equal numbers of Combat Lifesavers (42%) and Combat Medics (44%). A post-simulation questionnaire suggested no significant mean differences between the standard vs. GRK manikins with regard to the simulators’ perceived realism, anatomical correctness, or ability to provide meaningful skills practice. However, statistically higher mean scores were reported for questionnaire items that focused on the female manikin’s realistic breast tissue, realistic skin texture, and feminine facial appearance. Linear mixed models were used to separately test the effects of participant gender (or job title), manikin gender, and their interaction on both task performance speed and quality. There were no statistically significant differences in task completion order or speed of task completion. All participants performed the three tasks in accordance with the Army’s MARCH-E algorithm, and all had similar completion times. With regard to the quality of task performance, the analyses revealed only one statistically significant main effect of manikin gender: the GRK manikin had a lower mean task performance score for the treatment of gunshot wounds, which required disrobing the patient to apply the chest seal. Based on the results of this exploratory study, we are prioritizing potential improvements to the GRK, and are planning a more rigorously-controlled validation study with the revised prototype. Additional implications and lessons learned will be discussed.

Angela Alban, Cheryl Coiro, Trisha Patel, Jeffrey Beaubien, Mark Mazzeo
Open Access
Article
Conference Proceedings

An Analysis of Squad Communication Behaviors during a Field-Training Exercise to Support Tactical Decision Making

Understanding how teams function in dynamic environments is critical for advancing theories of team development. In this paper, we compared communication behaviors of high and low performing U.S. Army squads that completed a field training event designed to assess tactical decision-making skills and performance under stress. Transcribed audio logs of U.S. Army squad communications were analyzed. A series of 2 (performance group) by 2 (time: Pre-Contact and Post-Contact) mixed-model ANOVAs were conducted to determine whether team communication behaviors changed for squads after coming under duress from hostile contact. Significant main effects for time were found for several communication labels indicating communication patterns differed as task complexity and stressors increased. Significant interaction effects were found between time and performance group for the number of commands given by squad leaders and overall speech frequency. Results highlight the value of examining communications at a granular level as adaptive patterns may otherwise be overlooked.

Jason Saville, Randall Spain, Joan Johnston, James Lester
Open Access
Article
Conference Proceedings