Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation

Open Access
Article
Conference Proceedings
Authors: Shunpei KiuchiMasami MatsushimaKeiichi Watanuki
Abstract

This paper proposes a novel pipeline for estimating individual cognitive processing ability from eye-tracking data using the latent representations of a deep-learning model trained to predict trial-level correctness during problem-solving. Ten undergraduate participants completed four cognitive tasks: reading comprehension, visuospatial reasoning, memory recall, and focused attention. Binocular gaze data were recorded using a Tobii Pro eye-tracking device. A Transformer-based sequence model, optimized via a 200-trial Bayesian search using the silhouette score of the resulting latent space as the objective, was trained to classify each 100-timestep gaze sequence as correct or incorrect. The optimal architecture achieved a validation accuracy of 0.725 and produced a 32-dimensional latent representation per trial. Univariate logistic regression identified the three most cognitively informative latent dimensions (z_0028, z_0022, and z_0031), each achieving a classification accuracy of 0.700–0.712 independently. Within the resulting 3D subspace, the per-participant centroids of correct- and incorrect-trial embeddings exhibited consistent directional displacement along the primary cognitive axis, providing an interpretable subject-level index of the cognitive processing ability without any external standardized assessment. A supplementary longitudinal experiment further demonstrated that the session-level centroid shifted substantially toward and beyond the correct-trial region following a task-specific training intervention in a single participant, suggesting that the proposed representation is sensitive to training-induced cognitive changes. Although the latent space exhibited weak global cluster separation and the training experiment remains preliminary, these findings support the viability of gaze-based latent centroid tracking as a non-invasive biomarker for both static cognitive profiling and longitudinal cognitive change detection.

Keywords: Concept Activation Vector, Latent Space, Eye Tracking, Cognitive Ability Estimation, Transformer, Gaze-based Assessment

DOI: 10.54941/ahfe1007332

Cite this paper
Downloads
0
Visits
3
Download PDF

More from this volume

Feasibility study of estimating visuospatial cognition and mental states using eye movement and brain activity during domain-specific tasksLightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM
View all articles in Affective and Pleasurable Design