Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation
Abstract
This paper proposes a novel pipeline for estimating individual cognitive processing ability from eye-tracking data using the latent representations of a deep-learning model trained to predict trial-level correctness during problem-solving. Ten undergraduate participants completed four cognitive tasks: reading comprehension, visuospatial reasoning, memory recall, and focused attention. Binocular gaze data were recorded using a Tobii Pro eye-tracking device. A Transformer-based sequence model, optimized via a 200-trial Bayesian search using the silhouette score of the resulting latent space as the objective, was trained to classify each 100-timestep gaze sequence as correct or incorrect. The optimal architecture achieved a validation accuracy of 0.725 and produced a 32-dimensional latent representation per trial. Univariate logistic regression identified the three most cognitively informative latent dimensions (z_0028, z_0022, and z_0031), each achieving a classification accuracy of 0.700–0.712 independently. Within the resulting 3D subspace, the per-participant centroids of correct- and incorrect-trial embeddings exhibited consistent directional displacement along the primary cognitive axis, providing an interpretable subject-level index of the cognitive processing ability without any external standardized assessment. A supplementary longitudinal experiment further demonstrated that the session-level centroid shifted substantially toward and beyond the correct-trial region following a task-specific training intervention in a single participant, suggesting that the proposed representation is sensitive to training-induced cognitive changes. Although the latent space exhibited weak global cluster separation and the training experiment remains preliminary, these findings support the viability of gaze-based latent centroid tracking as a non-invasive biomarker for both static cognitive profiling and longitudinal cognitive change detection.
Keywords: Concept Activation Vector, Latent Space, Eye Tracking, Cognitive Ability Estimation, Transformer, Gaze-based Assessment
DOI: 10.54941/ahfe1007332
Cite this paper
More from this volume
- An Embodied Interaction System for Five-Tone Music Therapy: A Guqin-Inspired Multimodal Design
- Beyond Function: An Analysis of Affective Design Factors in Japanese Mechanical Watches with High Auction Prices
- Environment Providing Necessary Information to Users Using Multiple IoT Avatars
- i-EyFuze: An Eye-Shaped eHMI in Autonomous Vehicles that Provides Intentions for Pedestrians
- Voice-Based Human Relaxation Assessment Using Autoencoder-Driven Anomaly Detection of Calm Speech
- Feasibility study of estimating visuospatial cognition and mental states using eye movement and brain activity during domain-specific tasks
- Lightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM
- Estimating 3D Ground Reaction Forces During Gait Using a Deep Learning Model with IMU and Plantar Pressure Data
- Integrating SOR and TAM Models to Explore Consumer Emotions and Preferences in Fur Fashion Design
- Effect of Changing Task Sequence on Physical Workload in Agricultural Operations
- Influence of Social Appearance Attributes of Cyber Driving Support Agents on the Passenger Effect
- Design of a Community-Based Digital Platform for Standardized Stray Cat Rescue Based on Service System Design


AHFE Open Access