Ergonomic Problem and Solution Identification by Applying Image Captioning with Embedded Ergonomic Knowledge

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Ergonomic Problem and Solution Identification by Applying Image Captioning with Embedded Ergonomic Knowledge

Open Access

Article

Conference Proceedings

Authors: Gunwoo Yong, Quan Miao, Meiyin Liu, Sanghyun Lee

Abstract: Work-related musculoskeletal disorders (WMSDs) are a primary cause of non-fatal injuries in diverse industries. Traditional manual ergonomic problem and solution identification for reducing WMSDs is time-consuming and limited by expert availability. Image captioning—interpreting images of workers and their workplaces and capturing interactions therein—is one potential alternative. Yet, due to the absence of ergonomic knowledge, conventional image captioning models are limited in generating accurate captions of ergonomic problems and solutions. Therefore, we aim to automatically identify ergonomic problems and solutions from images by applying image captioning embedded with an ergonomic knowledge graph. Specifically, we developed an ergonomic knowledge graph encoder and incorporated it with the state-of-the-art image captioning model. Comparative testing on eight ergonomic problem-solution pairs showed that our model outperformed the state-of-the-art model. This result highlights the critical role of integrating ergonomic knowledge into image captioning models, paving the way for broader workplace applications to reduce WMSDs.To this end, we first crafted ergonomics knowledge graphs based on essential elements for ergonomic problem and solution identification, such as ergonomic risk factors and task information. Subsequently, we made a pipeline identifying pre-built ergonomics knowledge graphs from images through object detection and pose estimation. Finally, we modified an image captioning model to interpret images based on our knowledge graphs. We used an instruction-tunable image captioning model as the backbone model. While traditional models generate captions solely from images, instruction-tunable models can generate captions based on both images and textual instructions. Specifically, we harnessed InstructBLIP for its robust performance and modified it to accept non-textual instructions, i.e., ergonomics knowledge graphs.For training and testing, we collected 2,000 images from various real workplaces. For each image, we associated it with one of our knowledge graphs and annotated it with a caption describing an ergonomic problem and its corresponding solution according to a NIOSH ergonomic guideline. Our dataset consequently consists of 2,000 pairs, each comprising an image, a knowledge graph, and a caption. The bilingual evaluation understudy (BLEU) metric was utilized to assess the image captioning performance in ergonomic problem and solution identification. BLEU quantifies the similarity between artificial intelligence (AI)-generated captions and human-generated captions on a scale from 0 to 1. To demonstrate that the ergonomics knowledge graph enhances the performance of identifying ergonomic problems and solutions, we compared our model with our backbone model, which generates captions without ergonomics knowledge graphs. In terms of the BLEU-4 score, which compares sequences of four consecutive words, our model scored 0.834, while the baseline model scored 0.712. Our superior BLEU score indicates that the captions generated by our knowledge graph-enhanced model are more similar to the ergonomic problems and solutions specified in the ergonomic guideline than those generated by existing models.This result demonstrates the feasibility of image captioning with embedded ergonomics knowledge for automated ergonomic problem and solution identification from images. Our accessible automated approach is designed to assist in reducing potential WMSDs by intervening in hazardous workplaces where ergonomics knowledge or ergonomic experts are limited.

Keywords: Ergonomics, Artificial Intelligence, Image Captioing

DOI: 10.54941/ahfe1005582

Cite this paper:

Downloads

207

Visits

461