Multi-Scale Spatiotemporal Attention-Based Sign Language Recognition
Open Access
Article
Conference Proceedings
Authors: Xiao Hui Hou, Zi Han Mei, Yingxiao Han, Zidan Sun
Abstract: Over 430 million people globally face significant communication barriers due to hearing loss, yet existing Sign Language Recognition (SLR) technologies often overlook critical multimodal integration, resulting in limited practical usability. To address this issue, we propose a Multi-Scale Spatiotemporal Attention-Based SLR system that combines advanced deep learning techniques—including Graph Convolutional Networks (GCNs), multi-scale CNNs, and dual attention mechanisms—to effectively fuse multimodal data and enhance real-time gesture interpretation. A comprehensive usability evaluation was conducted with 28 diverse participants, integrating quantitative measures (accuracy, latency, false activations) and qualitative assessments (System Usability Scale—SUS, user interviews). The primary objective was to evaluate the extent to which technical improvements could translate into meaningful enhancements in user experience and system acceptance. Results demonstrated substantial performance improvements (95.8% accuracy, 0.8s latency per gesture) and outstanding usability (average SUS score of 82.5). User feedback highlighted that system responsiveness and intuitive error correction significantly increased satisfaction and trust, underscoring the importance of combining technical accuracy with user-centered design. This study confirms that an integrated focus on multimodal recognition and rigorous usability evaluation is essential for successful real-world deployment of SLR technologies.
Keywords: Sign Language Recognition, Usability, User Experience, Attention Mechanism, Multimodal Integration
DOI: 10.54941/ahfe1006677
Cite this paper:
Downloads
0
Visits
16