Lightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM

Suhas Mangalesh Chavan; Kazunori Kaede; Keiichi Watanuki

doi:10.54941/ahfe1007333

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Lightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM

Open Access

Article

Conference Proceedings

Authors: Suhas Mangalesh Chavan, Kazunori Kaede, Keiichi Watanuki

Abstract

Driver drowsiness contributes to approximately 10%–20% of global road accidents. Camera-based fatigue detection systems usually require a tradeoff: simple models using geometric thresholds often miss subtle early signs of sleepiness, whereas deep-learning models with higher accuracy rely on computationally heavy raw-pixel processing. This paper presents a lightweight and computationally efficient alternative for real-time edge devices. Instead of processing raw video frames, the proposed system utilizes the Google MediaPipe Face Landmarker to extract a streamlined vector of facial blendshape coefficients. These kinematics were processed using a dual-attention hierarchical bidirectional long short-term memory network. To capture both quick blink events and gradual fatigue over time, the model analyzes 3,600-frame (2 min) video segments using a sliding window approach that evaluates localized 50-frame (1.6 s) microchunks with a 25-frame stride. During training, rather than forcing fatigue into a strict binary classification, this architecture models drowsiness as a continuous progression using soft target probabilities. This allows the network to evaluate the gradual temporal patterns of early-onset fatigue, such as changes in blink behavior over time. This approach allowed the system to be successfully generalized across different individuals in the dataset. Evaluated on the unfiltered UTA-RLDD dataset using an early detection threshold of 0.35, the model achieved a window-level accuracy of 86.90%, a video-level accuracy of 88.89%, and a critical safety window-level sensitivity of 91.14%. Finally, this paper proposes a hardware architecture for a closed-loop haptic mitigation seat and establishes a foundation for future simulator-based validation studies.

Keywords: Driver Drowsiness, Deep Learning, Hierarchical BiLSTM, Dual-level Attention Mechanism, Mediapipe, Blendshapes, Ocular Dynamics, Driver State Monitoring

DOI: 10.54941/ahfe1007333

Cite this paper

Downloads

45

Visits

76

Download PDF

More from this volume

← Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation Estimating 3D Ground Reaction Forces During Gait Using a Deep Learning Model with IMU and Plantar Pressure Data →

View all articles in Affective and Pleasurable Design →