Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation

Shunpei Kiuchi; Masami Matsushima; Keiichi Watanuki

doi:10.54941/ahfe1007332

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Deep Learning of Latent Gaze Representations for Cognitive Ability and Mental State Estimation

Open Access

Article

Conference Proceedings

Authors: Shunpei Kiuchi, Masami Matsushima, Keiichi Watanuki

Abstract

This paper proposes a novel pipeline for estimating individual cognitive processing ability from eye-tracking data using the latent representations of a deep-learning model trained to predict trial-level correctness during problem-solving. Ten undergraduate participants completed four cognitive tasks: reading comprehension, visuospatial reasoning, memory recall, and focused attention. Binocular gaze data were recorded using a Tobii Pro eye-tracking device. A Transformer-based sequence model, optimized via a 200-trial Bayesian search using the silhouette score of the resulting latent space as the objective, was trained to classify each 100-timestep gaze sequence as correct or incorrect. The optimal architecture achieved a validation accuracy of 0.725 and produced a 32-dimensional latent representation per trial. Univariate logistic regression identified the three most cognitively informative latent dimensions (z_0028, z_0022, and z_0031), each achieving a classification accuracy of 0.700–0.712 independently. Within the resulting 3D subspace, the per-participant centroids of correct- and incorrect-trial embeddings exhibited consistent directional displacement along the primary cognitive axis, providing an interpretable subject-level index of the cognitive processing ability without any external standardized assessment. A supplementary longitudinal experiment further demonstrated that the session-level centroid shifted substantially toward and beyond the correct-trial region following a task-specific training intervention in a single participant, suggesting that the proposed representation is sensitive to training-induced cognitive changes. Although the latent space exhibited weak global cluster separation and the training experiment remains preliminary, these findings support the viability of gaze-based latent centroid tracking as a non-invasive biomarker for both static cognitive profiling and longitudinal cognitive change detection.

Keywords: Concept Activation Vector, Latent Space, Eye Tracking, Cognitive Ability Estimation, Transformer, Gaze-based Assessment

DOI: 10.54941/ahfe1007332

Cite this paper

Downloads

32

Visits

51

Download PDF

More from this volume

← Feasibility study of estimating visuospatial cognition and mental states using eye movement and brain activity during domain-specific tasks Lightweight Driver Drowsiness Detection Model Using MediaPipe Blendshapes and a Dual-Attention Hierarchical BiLSTM →

View all articles in Affective and Pleasurable Design →