Five-level Drowsiness Estimation Using BlendShape Features Captured by a Smartphone’s Front-facing Camera
Open Access
Article
Conference Proceedings
Authors: Shunki Suzuki, Hisaya Tanaka
Abstract: This study proposes a real-time, non-contact drowsiness estimation method using a smartphone’s front-facing camera and Apple’s ARKit. While previous studies have primarily relied on eye-blink or facial cues limited to the eye and mouth, our approach captures a wider range of facial expressions, head sway, and eye movement using 52 BlendShape parameters, including features such as eyebrows, cheeks, and nose, along with 3D orientation data. Participants performed a driving task designed to induce drowsiness in a simulator environment, and external raters evaluated their drowsiness on a five-point scale. These ratings were then used as labels to train a K-nearest neighbor (KNN) classifier on features derived from mean values and temporal variances of facial indicators, sampled every five seconds. To enhance model interpretability, SHapley Additive exPlanations (SHAP) were employed to quantify the contribution of each indicator to the classification results. Results show that the amount of movement and standard deviation of indicators—rather than absolute position—were strongly associated with higher classification accuracy. Mouth-related indicators, such as yawning and lip movement, showed particularly high contributions to drowsiness prediction. Using data labeled by external raters on a five-point scale, we performed binary and ternary classification by downsampling from the original five-class dataset. As a result, the proposed method achieved classification accuracy of 98.6%, 89.6%, and 70.5% for binary, ternary, and five-class settings, respectively, with F1 scores of up to 99.3%. These findings suggest that smartphones equipped with ARKit can serve as reliable and accessible tools for detecting drowsiness using facial expression dynamics. Importantly, temporal variation in facial movements—especially head sway and eye closure patterns—proved to be more robust than static features in distinguishing levels of alertness. Future work will optimize feature selection to reduce computational load and improve classification performance, particularly for fine-grained tasks such as five-level drowsiness estimation.
Keywords: Drowsiness Estimation, Smartphone, Face Tracking, ARKit, Machine Learning
DOI: 10.54941/ahfe1006877
Cite this paper:
Downloads
9
Visits
36


AHFE Open Access