Lightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM
Abstract
Driver drowsiness contributes to approximately 10%–20% of global road accidents. Camera-based fatigue detection systems usually require a tradeoff: simple models using geometric thresholds often miss subtle early signs of sleepiness, whereas deep-learning models with higher accuracy rely on computationally heavy raw-pixel processing. This paper presents a lightweight and computationally efficient alternative for real-time edge devices. Instead of processing raw video frames, the proposed system utilizes the Google MediaPipe Face Landmarker to extract a streamlined vector of facial blendshape coefficients. These kinematics were processed using a dual-attention hierarchical bidirectional long short-term memory network. To capture both quick blink events and gradual fatigue over time, the model analyzes 3,600-frame (2 min) video segments using a sliding window approach that evaluates localized 50-frame (1.6 s) microchunks with a 25-frame stride. During training, rather than forcing fatigue into a strict binary classification, this architecture models drowsiness as a continuous progression using soft target probabilities. This allows the network to evaluate the gradual temporal patterns of early-onset fatigue, such as changes in blink behavior over time. This approach allowed the system to be successfully generalized across different individuals in the dataset. Evaluated on the unfiltered UTA-RLDD dataset using an early detection threshold of 0.35, the model achieved a window-level accuracy of 86.90%, a video-level accuracy of 88.89%, and a critical safety window-level sensitivity of 91.14%. Finally, this paper proposes a hardware architecture for a closed-loop haptic mitigation seat and establishes a foundation for future simulator-based validation studies.
Keywords: Driver Drowsiness, Deep Learning, Hierarchical BiLSTM, Dual-level Attention Mechanism, Mediapipe, Blendshapes, Ocular Dynamics, Driver State Monitoring
DOI: 10.54941/ahfe1007333
Cite this paper
More from this volume
- An Embodied Interaction System for Five-Tone Music Therapy: A Guqin-Inspired Multimodal Design
- Beyond Function: An Analysis of Affective Design Factors in Japanese Mechanical Watches with High Auction Prices
- Environment Providing Necessary Information to Users Using Multiple IoT Avatars
- i-EyFuze: An Eye-Shaped eHMI in Autonomous Vehicles that Provides Intentions for Pedestrians
- Voice-Based Human Relaxation Assessment Using Autoencoder-Driven Anomaly Detection of Calm Speech
- Feasibility study of estimating visuospatial cognition and mental states using eye movement and brain activity during domain-specific tasks
- Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation
- Estimating 3D Ground Reaction Forces During Gait Using a Deep Learning Model with IMU and Plantar Pressure Data
- Integrating SOR and TAM Models to Explore Consumer Emotions and Preferences in Fur Fashion Design
- Effect of Changing Task Sequence on Physical Workload in Agricultural Operations
- Influence of Social Appearance Attributes of Cyber Driving Support Agents on the Passenger Effect
- Design of a Community-Based Digital Platform for Standardized Stray Cat Rescue Based on Service System Design


AHFE Open Access